Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua

doi:10.1016/J.JCP.2014.08.024

Title: Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Journal Article · Mon Dec 01 00:00:00 EST 2014 · Journal of Computational Physics

DOI:https://doi.org/10.1016/J.JCP.2014.08.024· OSTI ID:22382148

Xu, Chuanfu ^[1]; Deng, Xiaogang; Zhang, Lilun ^[1]; Fang, Jianbin ^[2]; Wang, Guangxue; Jiang, Yi ^[3]; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua ^[1]

College of Computer Science, National University of Defense Technology, Changsha 410073 (China)
Parallel and Distributed Systems Group, Delft University of Technology, Delft 2628CD (Netherlands)
State Key Laboratory of Aerodynamics, P.O. Box 211, Mianyang 621000 (China)

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.

Cite

Export

Save

OSTI ID:: 22382148

Journal Information:: Journal of Computational Physics, Vol. 278; Other Information: Copyright (c) 2014 Elsevier Science B.V., Amsterdam, The Netherlands, All rights reserved.; Country of input: International Atomic Energy Agency (IAEA); ISSN 0021-9991

Country of Publication:: United States

Language:: English

Similar Records

Quantum Monte Carlo Endstation for Petascale Computing

Technical Report · Wed Mar 02 00:00:00 EST 2011 · OSTI ID:22382148

Ceperley, David

High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Journal Article · Fri Oct 01 00:00:00 EDT 2010 · Journal of Computational Physics · OSTI ID:22382148

Komatitsch, Dimitri; Erlebacher, Gordon; Goeddeke, Dominik; +1 more

Accelerating dissipative particle dynamics simulations on GPUs: Algorithms, numerics and applications

Journal Article · Mon Jun 23 00:00:00 EDT 2014 · Computer Physics Communications · OSTI ID:22382148

Tang, Yu-Hang; Karniadakis, George Em

Related Subjects

71 CLASSICAL AND QUANTUM MECHANICS
GENERAL PHYSICS
97 MATHEMATICAL METHODS AND COMPUTING
ACCELERATORS
AIRFOILS
CALCULATION METHODS
CHINA
COMMUNICATIONS
COMPACTS
COMPARATIVE EVALUATIONS
COMPUTER CODES
CONFIGURATION
CURRENTS
EARTH PLANET
EFFICIENCY
GLOBAL ASPECTS
GRIDS
HYBRIDIZATION
KERNELS
MAXIMUM PERMISSIBLE INTAKE
OPTIMIZATION
SIMULATION
SINGULARITY

Title: Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Citation Formats

Similar Records

Related Subjects