skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High performance parallel implicit CFD.

Abstract

Fluid dynamical simulations based on finite discretizations on (quasi-)static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDE-based codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Science Foundation (NSF)
OSTI Identifier:
943180
Report Number(s):
ANL/MCS/JA-38061
Journal ID: ISSN 0167-8191; PACOEJ; TRN: US201002%%657
DOE Contract Number:
DE-AC02-06CH11357
Resource Type:
Journal Article
Resource Relation:
Journal Name: Parallel Comput.; Journal Volume: 27; Journal Issue: 4 ; Mar. 2001
Country of Publication:
United States
Language:
ENGLISH
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; ALGORITHMS; COMPUTER CODES; COMPUTERS; COMPUTERIZED SIMULATION; FLUID MECHANICS; NASA; PERFORMANCE

Citation Formats

Gropp, W. D., Kaushik, D. K., Keyes, D. E., Smith, B. F., Mathematics and Computer Science, and Old Dominion Univ. High performance parallel implicit CFD.. United States: N. p., 2001. Web. doi:10.1016/S0167-8191(00)00075-2.
Gropp, W. D., Kaushik, D. K., Keyes, D. E., Smith, B. F., Mathematics and Computer Science, & Old Dominion Univ. High performance parallel implicit CFD.. United States. doi:10.1016/S0167-8191(00)00075-2.
Gropp, W. D., Kaushik, D. K., Keyes, D. E., Smith, B. F., Mathematics and Computer Science, and Old Dominion Univ. 2001. "High performance parallel implicit CFD.". United States. doi:10.1016/S0167-8191(00)00075-2.
@article{osti_943180,
title = {High performance parallel implicit CFD.},
author = {Gropp, W. D. and Kaushik, D. K. and Keyes, D. E. and Smith, B. F. and Mathematics and Computer Science and Old Dominion Univ.},
abstractNote = {Fluid dynamical simulations based on finite discretizations on (quasi-)static grids scale well in parallel, but execute at a disappointing percentage of per-processor peak floating point operation rates without special attention to layout and access ordering of data. We document both claims from our experience with an unstructured grid CFD code that is typical of the state of the practice at NASA. These basic performance characteristics of PDE-based codes can be understood with surprisingly simple models, for which we quote earlier work, presenting primarily experimental results. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per node performance. This snapshot of ongoing work updates our 1999 Bell Prize-winning simulation on ASCI computers.},
doi = {10.1016/S0167-8191(00)00075-2},
journal = {Parallel Comput.},
number = 4 ; Mar. 2001,
volume = 27,
place = {United States},
year = 2001,
month = 3
}
  • The authors describe a parallel alternate direction implicit (ADI) algorithm for the solution of the unsteady heat conduction equation discretized with fourth-order accuracy. A novel parallel pentadiagonal line-inversion procedure based on a divide-and-conquer strategy is used in conjunction with a domain-decomposition technique. The algorithm has been implemented on the CM-5 in the MIMD mode, and its performance for varying grid sizes and number of processors is investigated.
  • A mesh-vertex finite volume scheme for solving the Euler equations on triangular unstructured meshes is implemented on a multiple-instruction/multiple-data stream parallel computer. An explicit four-stage Runge-Kutta scheme is used to solve two-dimensional flow problems. A family of implicit schemes is also developed to solve these problems, where the linear system that arises at each time step is solved by a preconditioned GMRES algorithm. Two partitioning strategies are employed: one that partitions triangles and the other that partitions vertices. The choice of the preconditioner in a distributed memory setting is discussed. All of the methods are compared both in terms ofmore » elapsed times and convergence rates. It is shown that the implicit schemes offer adequate parallelism at the expense of minimal sequential overhead. The use of a global coarse grid to further minimize this overhead is also investigated. The schemes are implemented on a distributed memory parallel computer, the Intel iPSC/860. 23 refs.« less
  • An implicit scheme employing superposition of two (three) one-dimensional solutions in two (three) dimensions is proposed for the solution of Navier-Stokes equations is proposed. Boundary conditions are implemented in the discrete governing equations before the super position takes place, hence no boundary conditions are needed for the one-dimensional solutions. Details of the method as well as numerical confirmation of the analytical developments are discussed. 9 refs.
  • We consider two existing asynchronous parallel algorithms for Implicit Monte Carlo (IMC) thermal radiation transport on spatially decomposed meshes. The two algorithms are from the production codes KULL from Lawrence Livermore National Laboratory and Milagro from Los Alamos National Laboratory. Both algorithms were considered and analyzed in an implementation of the KULL IMC package in ALEGRA, a Sandia National Laboratory high energy density physics code. Improvements were made to both algorithms. The improved Milagro algorithm performed the best by scaling nearly perfectly out to 244 processors.
  • We consider four asynchronous parallel algorithms for Implicit Monte Carlo (IMC) thermal radiation transport on spatially decomposed meshes. Two of the algorithms are from the production codes KULL from Lawrence Livermore National Laboratory and Milagro from Los Alamos National Laboratory. Improved versions of each of the existing algorithms are also presented. All algorithms were analyzed in an implementation of the KULL IMC package in ALEGRA, a Sandia National Laboratory high energy density physics code. The improved Milagro algorithm performed the best by scaling almost linearly out to 244 processors for well load balanced problems.