DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel performance optimizations on unstructured mesh-based simulations

Abstract

This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

Authors:
 [1];  [2];  [3];  [4];  [2];  [4];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Univ. of Maryland, College Park (United States)
  3. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  4. Univ. of Oregon, Eugene, OR (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES)
OSTI Identifier:
1202396
Grant/Contract Number:  
AC02-05CH11231; SC0006723
Resource Type:
Accepted Manuscript
Journal Name:
Procedia Computer Science
Additional Journal Information:
Journal Volume: 51; Journal Issue: C; Conference: International Conference On Computational Science (ICCS 2015 ), Reykjavík (Iceland) , 1-3 Jun 2015; Journal ID: ISSN 1877-0509
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; unstructured mesh; ocean modeling; graph partitioning; performance optimization

Citation Formats

Sarje, Abhinav, Song, Sukhyun, Jacobsen, Douglas, Huck, Kevin, Hollingsworth, Jeffrey, Malony, Allen, Williams, Samuel, and Oliker, Leonid. Parallel performance optimizations on unstructured mesh-based simulations. United States: N. p., 2015. Web. doi:10.1016/j.procs.2015.05.466.
Sarje, Abhinav, Song, Sukhyun, Jacobsen, Douglas, Huck, Kevin, Hollingsworth, Jeffrey, Malony, Allen, Williams, Samuel, & Oliker, Leonid. Parallel performance optimizations on unstructured mesh-based simulations. United States. https://doi.org/10.1016/j.procs.2015.05.466
Sarje, Abhinav, Song, Sukhyun, Jacobsen, Douglas, Huck, Kevin, Hollingsworth, Jeffrey, Malony, Allen, Williams, Samuel, and Oliker, Leonid. Mon . "Parallel performance optimizations on unstructured mesh-based simulations". United States. https://doi.org/10.1016/j.procs.2015.05.466. https://www.osti.gov/servlets/purl/1202396.
@article{osti_1202396,
title = {Parallel performance optimizations on unstructured mesh-based simulations},
author = {Sarje, Abhinav and Song, Sukhyun and Jacobsen, Douglas and Huck, Kevin and Hollingsworth, Jeffrey and Malony, Allen and Williams, Samuel and Oliker, Leonid},
abstractNote = {This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.},
doi = {10.1016/j.procs.2015.05.466},
journal = {Procedia Computer Science},
number = C,
volume = 51,
place = {United States},
year = {Mon Jun 01 00:00:00 EDT 2015},
month = {Mon Jun 01 00:00:00 EDT 2015}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 7 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Optimal Cache-Oblivious Mesh Layouts
journal, October 2009

  • Bender, Michael A.; Kuszmaul, Bradley C.; Teng, Shang-Hua
  • Theory of Computing Systems, Vol. 48, Issue 2
  • DOI: 10.1007/s00224-009-9242-2

The Combinatorial BLAS: design, implementation, and applications
journal, May 2011

  • Buluç, Aydın; Gilbert, John R.
  • The International Journal of High Performance Computing Applications, Vol. 25, Issue 4
  • DOI: 10.1177/1094342011403516

Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication
journal, July 1999

  • Catalyurek, U. V.; Aykanat, C.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 10, Issue 7
  • DOI: 10.1109/71.780863

A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications
journal, August 2013


Analysis of the clustering properties of the Hilbert space-filling curve
journal, January 2001

  • Moon, B.; Jagadish, H. V.; Faloutsos, C.
  • IEEE Transactions on Knowledge and Data Engineering, Vol. 13, Issue 1
  • DOI: 10.1109/69.908985

A multi-resolution approach to global ocean modeling
journal, September 2013


Parallel static and dynamic multi-constraint graph partitioning
journal, January 2002

  • Schloegel, Kirk; Karypis, George; Kumar, Vipin
  • Concurrency and Computation: Practice and Experience, Vol. 14, Issue 3
  • DOI: 10.1002/cpe.605

Revisiting Hypergraph Models for Sparse Matrix Partitioning
journal, January 2007


Simple and Efficient Mesh Layout with Space-Filling Curves
journal, January 2012


Architecture Aware Partitioning Algorithms
book, January 2008


A new metric for dynamic load balancing
journal, December 2000


Works referencing / citing this record:

A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in Firedrake
journal, January 2016

  • Bercea, Gheorghe-Teodor; McRae, Andrew T. T.; Ham, David A.
  • Geoscientific Model Development, Vol. 9, Issue 10
  • DOI: 10.5194/gmd-9-3803-2016

Progress in Fast, Accurate Multi-scale Climate Simulations
journal, January 2015