A massively parallel and scalable multi-CPU material point method

Wang, Xinlei; Qiu, Yuxing; Slattery, Stuart; Fang, Yu; Li, Minchen; Chun zhu, Song; Zhu, Yixin; Tang, Min; Manocha, Dinesh; Jiang, Chenfanfu

doi:10.1145/3386569.3392442

Title: A massively parallel and scalable multi-CPU material point method

Conference · Wed Jul 01 00:00:00 EDT 2020

DOI:https://doi.org/10.1145/3386569.3392442· OSTI ID:1820872

Wang, Xinlei ^[1]; Qiu, Yuxing ^[1];

^[2]; Fang, Yu ^[1]; Li, Minchen ^[1]; Chun zhu, Song ^[3]; Zhu, Yixin ^[3]; Tang, Min ^[4]; Manocha, Dinesh ^[5]; Jiang, Chenfanfu ^[1]

University of Pennsylvania
ORNL
UCLA
Zhejiang University, Hangzhou, China
University of Maryland

Harnessing the power of modern multi-GPU architectures, we present a massively parallel simulation system based on the Material Point Method (MPM) for simulating physical behaviors of materials undergoing complex topological changes, self-collision, and large deformations. Our system makes three critical contributions. First, we introduce a new particle data structure that promotes coalesced memory access patterns on the GPU and eliminates the need for complex atomic operations on the memory hierarchy when writing particle data to the grid. Second, we propose a kernel fusion approach using a new Grid-to-Particles-to-Grid (G2P2G) scheme, which efficiently reduces GPU kernel launches, improves latency, and significantly reduces the amount of global memory needed to store particle data. Finally, we introduce optimized algorithmic designs that allow for efficient sparse grids in a shared memory context, enabling us to best utilize modern multi-GPU computational platforms for hybrid Lagrangian-Eulerian computational patterns. We demonstrate the effectiveness of our method with extensive benchmarks, evaluations, and dynamic simulations with elastoplasticity, granular media, and fluid dynamics. In comparisons against an open-source and heavily optimized CPU-based MPM codebase [Fang et al. 2019] on an elastic sphere colliding scene with particle counts ranging from 5 to 40 million, our GPU MPM achieves over 100x per-time-step speedup on a workstation with an Intel 8086K CPU and a single Quadro P6000 GPU, exposing exciting possibilities for future MPM simulations in computer graphics and computational science. Moreover, compared to the state-of-the-art GPU MPM method [Hu et al. 2019a], we not only achieve 2x acceleration on a single GPU but our kernel fusion strategy and Array-of-Structs-of-Array (AoSoA) data structure design also generalizes to multi-GPU systems. Our multi-GPU MPM exhibits near-perfect weak and strong scaling with 4 GPUs, enabling performant and large-scale simulations on a 10243 grid with close to 100 million particles with less than 4 minutes per frame on a single 4-GPU workstation and 134 million particles with less than 1 minute per frame on an 8-GPU workstation.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1820872

Resource Relation:: Journal Volume: 39; Journal Issue: 4; Conference: SIGGRAPH 2020 - Washington D.C., District of Columbia, United States of America - 7/19/2020 4:00:00 AM-7/23/2020 4:00:00 AM

Country of Publication:: United States

Language:: English

Similar Records

Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system. In: XSEDE '12 Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond, Article No. 4

Conference · Sun Jan 01 00:00:00 EST 2012 · OSTI ID:1820872

Humphrey, Alan; Meng, Qingyu; Berzins, Martin; +1 more

Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Journal Article · Mon Dec 01 00:00:00 EST 2014 · Journal of Computational Physics · OSTI ID:1820872

Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; +9 more

CUDA Computation of the Feynman Distribution

Journal Article · Sat Jul 01 00:00:00 EDT 2017 · Transactions of the American Nuclear Society · OSTI ID:1820872

Talamo, A.; Gohar, Y.

Title: A massively parallel and scalable multi-CPU material point method

Citation Formats

Similar Records

Related Subjects