Highly Scalable GPU Based Solver

Computational Fluid Dynamics (CFD) is widely used in scientific and engineering fields to investigate fluid motion and its interactions with certain defined boundaries.  In CFD, the Navier-Stokes equations that govern the fluid motion are discretized into linear systems of millions of equations and the solution for large scale problem remains computationally challenging.  Current solvers achieve good scalability on the linear systems that can be partitioned into independent subsystems.  In this project, we present a GPU based, scalable Bi-Conjugate Gradient Stabilized (GBCG) solver that can be used to solve a wide range of banded linear systems. We utilize a novel row-oriented matrix decomposition method to divide the banded linear system into several correlated sub-linear systems and solve them on multiple GPUs collaboratively.  We design a number of GPU and MPI optimizations to speedup inter-GPU and inter-machine communications. The solver achieves a speedup of more than 21 times running from 6 to 192 GPUs on the XSEDE Keeneland supercomputer and because of small communication overhead, can scale upto 32 GPUs on Amazon EC2 with relatively slow ethernet network.

The movie above shows one heartbeat cycle of the left-ventricle with iso-surface of vortical structure, stream lines, and velocity vectors.
Click here to download the high resolution video (145MB)