An MPI + $X$ implementation of contact global search using Kokkos

Hansen, Glen A.; Xavier, Patrick G.; Mish, Sam P.; Voth, Thomas E.; Heinstein, Martin W.; Glass, Micheal W.

doi:10.1007/s00366-015-0418-x

Title: An MPI + $$X$$ implementation of contact global search using Kokkos

Journal Article · Mon Oct 05 04:00:00 UTC 2015 · Engineering with Computers

DOI: https://doi.org/10.1007/s00366-015-0418-x · OSTI ID:1335669

Hansen, Glen A. ^[1]; Xavier, Patrick G. ^[1]; Mish, Sam P. ^[1]; Voth, Thomas E. ^[1]; Heinstein, Martin W. ^[1]; Glass, Micheal W. ^[1]

Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

This paper describes an approach that seeks to parallelize the spatial search associated with computational contact mechanics. In contact mechanics, the purpose of the spatial search is to find “nearest neighbors,” which is the prelude to an imprinting search that resolves the interactions between the external surfaces of contacting bodies. In particular, we are interested in the contact global search portion of the spatial search associated with this operation on domain-decomposition-based meshes. Specifically, we describe an implementation that combines standard domain-decomposition-based MPI-parallel spatial search with thread-level parallelism (MPI-X) available on advanced computer architectures (those with GPU coprocessors). Our goal is to demonstrate the efficacy of the MPI-X paradigm in the overall contact search. Standard MPI-parallel implementations typically use a domain decomposition of the external surfaces of bodies within the domain in an attempt to efficiently distribute computational work. This decomposition may or may not be the same as the volume decomposition associated with the host physics. The parallel contact global search phase is then employed to find and distribute surface entities (nodes and faces) that are needed to compute contact constraints between entities owned by different MPI ranks without further inter-rank communication. Key steps of the contact global search include computing bounding boxes, building surface entity (node and face) search trees and finding and distributing entities required to complete on-rank (local) spatial searches. To enable source-code portability and performance across a variety of different computer architectures, we implemented the algorithm using the Kokkos hardware abstraction library. While we targeted development towards machines with a GPU accelerator per MPI rank, we also report performance results for OpenMP with a conventional multi-core compute node per rank. Results here demonstrate a 47 % decrease in the time spent within the global search algorithm, comparing the reference ACME algorithm with the GPU implementation, on an 18M face problem using four MPI ranks. As a result, while further work remains to maximize performance on the GPU, this result illustrates the potential of the proposed implementation.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

Sponsoring Organization:: USDOE National Nuclear Security Administration (NNSA)

Grant/Contract Number:: AC04-94AL85000

OSTI ID:: 1335669

Report Number(s):: SAND--2016-10645J; PII: 418

Journal Information:: Engineering with Computers, Journal Name: Engineering with Computers Journal Issue: 2 Vol. 32; ISSN 0177-0667

Publisher:: SpringerCopyright Statement

Country of Publication:: United States

Language:: English

References (16)

Fast In-Place Sorting with CUDA Based on Bitonic Sort Peters, Hagen; Schulz-Hildebrandt, Ole; Luttenberger, Norbert Parallel Processing and Applied Mathematics https://doi.org/10.1007/978-3-642-14390-8_42	book	January 2010
OpenACC — First Experiences with Real-World Applications Wienke, Sandra; Springer, Paul; Terboven, Christian Euro-Par 2012 Parallel Processing https://doi.org/10.1007/978-3-642-32820-6_85	book	January 2012
A parallel contact detection algorithm for transient solid dynamics simulations using PRONTO3D Attaway, S. W.; Hendrickson, B. A.; Plimpton, S. J. Computational Mechanics, Vol. 22, Issue 2 https://doi.org/10.1007/s004660050348	journal	August 1998
A Jacobian-free Newton Krylov method for mortar-discretized thermomechanical contact problems Hansen, Glen Journal of Computational Physics, Vol. 230, Issue 17 https://doi.org/10.1016/j.jcp.2011.04.038	journal	July 2011
Fast parallel GPU-sorting using a hybrid algorithm Sintorn, Erik; Assarsson, Ulf Journal of Parallel and Distributed Computing, Vol. 68, Issue 10 https://doi.org/10.1016/j.jpdc.2008.05.012	journal	October 2008
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel Journal of Parallel and Distributed Computing, Vol. 74, Issue 12 https://doi.org/10.1016/j.jpdc.2014.07.003	journal	December 2014
Scans as primitive parallel operations Blelloch, G. E. IEEE Transactions on Computers, Vol. 38, Issue 11 https://doi.org/10.1109/12.42122	journal	January 1989
Zoltan data management services for parallel dynamic applications Devine, K.; Boman, E.; Heaphy, R. Computing in Science & Engineering, Vol. 4, Issue 2 https://doi.org/10.1109/5992.988653	journal	January 2002
Designing efficient sorting algorithms for manycore GPUs Satish, Nadathur; Harris, Mark; Garland, Michael Distributed Processing (IPDPS), 2009 IEEE International Symposium on Parallel & Distributed Processing https://doi.org/10.1109/IPDPS.2009.5161005	conference	May 2009
Efficient parallel merge sort for fixed and variable length keys Davidson, Andrew; Tarjan, David; Garland, Michael 2012 Innovative Parallel Computing (InPar) https://doi.org/10.1109/InPar.2012.6339592	conference	May 2012
Composable Parallel Patterns with Intel Cilk Plus Robison, Arch D. Computing in Science & Engineering, Vol. 15, Issue 2 https://doi.org/10.1109/MCSE.2013.21	journal	March 2013
Fast BVH Construction on GPUs Lauterbach, C.; Garland, M.; Sengupta, S. Computer Graphics Forum, Vol. 28, Issue 2 https://doi.org/10.1111/j.1467-8659.2009.01377.x	journal	April 2009
Fast Four-Way Parallel Radix Sorting on GPUs Ha, Linh; KrÃ¼ger, Jens; Silva, ClÃ¡udio T. Computer Graphics Forum, Vol. 28, Issue 8 https://doi.org/10.1111/j.1467-8659.2009.01542.x	journal	December 2009
The design of a task parallel library Leijen, Daan; Schulte, Wolfram; Burckhardt, Sebastian Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA 09 https://doi.org/10.1145/1640089.1640106	conference	January 2009
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort Satish, Nadathur; Kim, Changkyu; Chhugani, Jatin Proceedings of the 2010 international conference on Management of data - SIGMOD '10 https://doi.org/10.1145/1807167.1807207	conference	January 2010
ALEGRA: An Arbitrary Lagrangian-Eulerian Multimaterial, Multiphysics Code Robinson, Allen; Brunner, Thomas; Carroll, Susan 46th AIAA Aerospace Sciences Meeting and Exhibit https://doi.org/10.2514/6.2008-1235	conference	June 2012

Similar Records

Utilizing many-core accelerators for halo and center finding within a cosmology simulation

Conference · Thu Oct 01 04:00:00 UTC 2015 · 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV); 25-26 Oct. 2015; Chicago, IL, USA · OSTI ID:1567588

Sewell, Christopher; Lo, Li-ta; Heitmann, Katrin; +2 more

HOMMEXX 1.0: A Performance Portable Atmospheric Dynamical Core for the Energy Exascale Earth System Model

Journal Article · Fri Oct 26 04:00:00 UTC 2018 · Geoscientific Model Development Discussions (Online) · OSTI ID:1497003

Bertagna, Luca; Deakin, Michael; Guba, Oksana; +5 more

Multi-GPU porting of a phase-change cascaded lattice Boltzmann method for three-dimensional pool boiling simulations

Journal Article · Tue Sep 23 04:00:00 UTC 2025 · Procedia Computer Science · OSTI ID:2996980

Gabbana, Alessandro; Fei, Linlin; de Wit, Xander M.; +3 more

Related Subjects

97 MATHEMATICS AND COMPUTING
contact problems
finite element analysis
partial differential equations
spatial searching

Title: An MPI + $$X$$ implementation of contact global search using Kokkos

Citation Formats

References (16)

Similar Records

Related Subjects