skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: GPU Data Access on Complex Geometries for D3Q19 Lattice Boltzmann Method

Conference ·

GPU performance of the lattice Boltzmann method (LBM) depends heavily on memory access patterns. When LBM is advanced with GPUs on complex computational domains, geometric data is typically accessed indirectly, and lattice data is typically accessed lexicographically in the Structure of Array (SoA) layout. Although there are a variety of existing access patterns beyond the typical choices, no study has yet examined the relative efficacy between them. Here, we compare a suite of memory access schemes via empirical testing and performance modeling. We find strong evidence that semi-direct addressing is the superior addressing scheme for the majority of cases examined: Semi-direct addressing increases computational speed and often reduces memory consumption. For lattice layout, we find that the Collected Structure of Arrays (CSoA) layout outperforms the SoA layout. When compared to state-of-the-art practices, our recommended addressing modifications lead to performance gains between 10-40% across different complex geometries, fluid volume fractions, and resolutions. The modifications also lead to a decrease in memory consumption by as much as 17%. Having discovered these improvements, we examine a highly resolved arterial geometry on a leadership class system. On this system we present the first near-optimal strong results for LBM with arterial geometries run on GPUs. We also demonstrate that the above recommendations remain valid for large scale, many device simulations, which leads to an increased computational speed and average memory usage reductions. To understand these observations, we employ performance modeling which reveals that semi-direct methods outperform indirect methods due to a reduced number of total loads/stores in memory, and that CSoA outperforms SoA and bundling due to improved caching behavior.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1474546
Resource Relation:
Conference: IEEE International Parallel and Distributed Processing Symposium (IPDPS) - Vancouver, , Canada - 5/21/2018 4:00:00 AM-5/25/2018 4:00:00 AM
Country of Publication:
United States
Language:
English

References (37)

Comparison of different propagation steps for lattice Boltzmann methods journal March 2013
Boundary conditions for lattice Boltzmann simulations journal June 1993
The Cardiac Atlas Project—an imaging database for computational modeling and statistical atlases of the heart journal July 2011
Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU–GPU clusters journal July 2015
Hydrokinetic approach to large-scale cardiovascular blood flow journal March 2010
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA journal May 2011
The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method journal June 2011
A Lattice-Boltzmann solver for 3D fluid simulation on GPU journal June 2012
A high-performance lattice Boltzmann implementation to model flow in porous media journal April 2004
Comparison of implementations of the lattice-Boltzmann method journal April 2008
A flexible high-performance Lattice Boltzmann GPU code for the simulations of fluid flows in complex geometries
  • Bernaschi, Massimo; Fatica, Massimiliano; Melchionna, Simone
  • Concurrency and Computation: Practice and Experience, Vol. 22, Issue 1 https://doi.org/10.1002/cpe.1466
journal January 2010
Performance engineering for the lattice Boltzmann method on GPGPUs: Architectural requirements and performance results journal July 2013
Lattice Boltzmann Simulations at Petascale on Multi-GPU Systems with Asynchronous Data Transfer and Strictly Enforced Memory Read Alignment
  • Robertsen, Fredrik; Westerholm, Jan; Mattila, Keijo
  • 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing https://doi.org/10.1109/PDP.2015.71
conference March 2015
A modular lattice boltzmann solver for GPU computing processors journal July 2012
COMPASS: A Framework for Automated Performance Modeling and Prediction conference January 2015
Massively parallel models of the human circulatory system
  • Randles, Amanda; Draeger, Erik W.; Oppelstrup, Tomas
  • SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2807591.2807676
conference November 2015
Complex fluid simulations with the parallel tree-based Lattice Boltzmann solver Musubi journal September 2014
Multiscale modeling of fluid transport in heterogeneous materials using discrete Boltzmann methods journal December 2002
Multiscale Simulation of Cardiovascular flows on the IBM Bluegene/P: Full Heart-Circulation System at Red-Blood Cell Resolution
  • Peters, Amanda; Melchionna, Simone; Kaxiras, Efthimios
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.33
conference November 2010
HemeLB: A high performance parallel lattice-Boltzmann code for large scale fluid flow in complex geometries journal June 2008
High-precision synthetic computed tomography of reconstructed porous media journal December 2011
Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method journal September 2014
Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster journal February 2011
Data layout optimization for multi-valued containers in OpenCL journal September 2012
A new approach to the lattice Boltzmann method for graphics processing units journal June 2011
Analysing and modelling the performance of the HemeLB lattice-Boltzmann simulation environment journal September 2013
Petaflop hydrokinetic simulations of complex flows on massive GPU clusters journal February 2013
Petaflop biofluidics simulations on a two million-core system
  • Bernaschi, Massimo; Bisson, Mauro; Endo, Toshio
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063389
conference January 2011
Lattice Boltzmann Method for Fluid Flows journal January 1998
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU–CPU clusters journal September 2011
A prospect for computing in porous materials research: Very large fluid flow simulations journal January 2016
SHIFT: An implementation for lattice Boltzmann simulation in low-porosity porous media journal May 2010
Aspen: A domain specific language for performance modeling
  • Spafford, Kyle L.; Vetter, Jeffrey S.
  • 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.20
conference November 2012
Multi-GPU implementation of the lattice Boltzmann method journal January 2013
Massively parallel simulations of hemodynamics in the primary large arteries of the human vasculature journal July 2015
Multiscale Hemodynamics Using GPU Clusters journal January 2012
Performance Analysis of the Lattice Boltzmann Model Beyond Navier-Stokes
  • Randles, Amanda Peters; Kale, Vivek; Hammond, Jeff
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.109
conference May 2013

Similar Records

Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm
Journal Article · Tue Mar 09 00:00:00 EST 2021 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1474546

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)
Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1474546

A prospect for computing in porous materials research: Very large fluid flow simulations
Journal Article · Fri Jan 01 00:00:00 EST 2016 · Journal of Computational Science · OSTI ID:1474546

Related Subjects