Computational Fluid
Dynamics (CFD) is widely used in scientific and
engineering fields to investigate fluid motion and its
interactions with certain defined boundaries. In
CFD, the Navier-Stokes equations that govern the fluid
motion are discretized into linear systems of millions of
equations and the solution for large scale problem remains
computationally challenging. Current solvers achieve
good scalability on the linear systems that can be
partitioned into independent subsystems. In this
project, we present a GPU based, scalable Bi-Conjugate
Gradient Stabilized (GBCG) solver that can be used to
solve a wide range of banded linear systems. We utilize a
novel row-oriented matrix decomposition method to divide
the banded linear system into several correlated
sub-linear systems and solve them on multiple GPUs
collaboratively. We design a number of GPU and MPI
optimizations to speedup inter-GPU and inter-machine
communications. The solver achieves a speedup of more than
21 times running from 6 to 192 GPUs on the XSEDE Keeneland
supercomputer and because of small communication overhead,
can scale upto 32 GPUs on Amazon EC2 with relatively slow
ethernet network.