High performance parallel implicit CFD.
Abstract
Fluid dynamical simulations based on finite discretizations on (quasi)static grids scale well in parallel, but execute at a disappointing percentage of perprocessor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDEbased codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prizewinning simulation on ASCI computers.
 Authors:
 Publication Date:
 Research Org.:
 Argonne National Lab. (ANL), Argonne, IL (United States)
 Sponsoring Org.:
 USDOE Office of Science (SC); National Science Foundation (NSF)
 OSTI Identifier:
 943180
 Report Number(s):
 ANL/MCS/JA38061
Journal ID: ISSN 01678191; PACOEJ; TRN: US201002%%657
 DOE Contract Number:
 DEAC0206CH11357
 Resource Type:
 Journal Article
 Resource Relation:
 Journal Name: Parallel Comput.; Journal Volume: 27; Journal Issue: 4 ; Mar. 2001
 Country of Publication:
 United States
 Language:
 ENGLISH
 Subject:
 97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; COMPUTER CODES; COMPUTERS; COMPUTERIZED SIMULATION; FLUID MECHANICS; NASA; PERFORMANCE
Citation Formats
Gropp, W. D., Kaushik, D. K., Keyes, D. E., Smith, B. F., Mathematics and Computer Science, and Old Dominion Univ. High performance parallel implicit CFD.. United States: N. p., 2001.
Web. doi:10.1016/S01678191(00)000752.
Gropp, W. D., Kaushik, D. K., Keyes, D. E., Smith, B. F., Mathematics and Computer Science, & Old Dominion Univ. High performance parallel implicit CFD.. United States. doi:10.1016/S01678191(00)000752.
Gropp, W. D., Kaushik, D. K., Keyes, D. E., Smith, B. F., Mathematics and Computer Science, and Old Dominion Univ. 2001.
"High performance parallel implicit CFD.". United States.
doi:10.1016/S01678191(00)000752.
@article{osti_943180,
title = {High performance parallel implicit CFD.},
author = {Gropp, W. D. and Kaushik, D. K. and Keyes, D. E. and Smith, B. F. and Mathematics and Computer Science and Old Dominion Univ.},
abstractNote = {Fluid dynamical simulations based on finite discretizations on (quasi)static grids scale well in parallel, but execute at a disappointing percentage of perprocessor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDEbased codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prizewinning simulation on ASCI computers.},
doi = {10.1016/S01678191(00)000752},
journal = {Parallel Comput.},
number = 4 ; Mar. 2001,
volume = 27,
place = {United States},
year = 2001,
month = 3
}

The authors describe a parallel alternate direction implicit (ADI) algorithm for the solution of the unsteady heat conduction equation discretized with fourthorder accuracy. A novel parallel pentadiagonal lineinversion procedure based on a divideandconquer strategy is used in conjunction with a domaindecomposition technique. The algorithm has been implemented on the CM5 in the MIMD mode, and its performance for varying grid sizes and number of processors is investigated.

Parallel implicit unstructured grid Euler solvers
A meshvertex finite volume scheme for solving the Euler equations on triangular unstructured meshes is implemented on a multipleinstruction/multipledata stream parallel computer. An explicit fourstage RungeKutta scheme is used to solve twodimensional flow problems. A family of implicit schemes is also developed to solve these problems, where the linear system that arises at each time step is solved by a preconditioned GMRES algorithm. Two partitioning strategies are employed: one that partitions triangles and the other that partitions vertices. The choice of the preconditioner in a distributed memory setting is discussed. All of the methods are compared both in terms ofmore » 
Parallel processing for implicit solutions of the NavierStokes equations
An implicit scheme employing superposition of two (three) onedimensional solutions in two (three) dimensions is proposed for the solution of NavierStokes equations is proposed. Boundary conditions are implemented in the discrete governing equations before the super position takes place, hence no boundary conditions are needed for the onedimensional solutions. Details of the method as well as numerical confirmation of the analytical developments are discussed. 9 refs. 
Comparison of Four Parallel Algorithms For Domain Decomposed Implicit Monte Carlo
We consider two existing asynchronous parallel algorithms for Implicit Monte Carlo (IMC) thermal radiation transport on spatially decomposed meshes. The two algorithms are from the production codes KULL from Lawrence Livermore National Laboratory and Milagro from Los Alamos National Laboratory. Both algorithms were considered and analyzed in an implementation of the KULL IMC package in ALEGRA, a Sandia National Laboratory high energy density physics code. Improvements were made to both algorithms. The improved Milagro algorithm performed the best by scaling nearly perfectly out to 244 processors. 
Comparison of four parallel algorithms for domain decomposed implicit Monte Carlo.
We consider four asynchronous parallel algorithms for Implicit Monte Carlo (IMC) thermal radiation transport on spatially decomposed meshes. Two of the algorithms are from the production codes KULL from Lawrence Livermore National Laboratory and Milagro from Los Alamos National Laboratory. Improved versions of each of the existing algorithms are also presented. All algorithms were analyzed in an implementation of the KULL IMC package in ALEGRA, a Sandia National Laboratory high energy density physics code. The improved Milagro algorithm performed the best by scaling almost linearly out to 244 processors for well load balanced problems.