skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: High Performance Computing, HPC 2012. Communications in Computer and Information Science

Abstract

A multi-scale finite element method code, msFEM, is tested on Jaguar and Nebulae, two petaflops computers that were listed as #1 and #2 on the Top500 list of June 2010 at the time of the tests. The flat MPI version of msFEM is scaled from 20K up to 200K CPU cores on Jaguar, delivering 70% parallel efficiency at the 200K cores with a finite element model of eight millions of degrees of freedom. GPU versions, in both double precision and mixed precision coded through MPI+OpenMP+CUDA hybrid programming, 900 GPU nodes on Jaguar and 1500 GPU nodes on Nebulae, achieving remarkable 90 + % parallel efficiency on the systems. The mixed-precision GPU version delivers further 1.5 times of speedup over the fully double precision version with no significant implementational cost. The large-scale tests support that the msFEM runs efficiently on petaflops computers and is highly potential for domain applications at extreme-scale.

Authors:
 [1];  [1];  [1];  [1]
  1. Chinese Academy of Sciences (CAS), Beijing (China). Inst. of Computing Technology, HPC Research Center
Publication Date:
Research Org.:
Oak Ridge National Laboratory, Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567533
Resource Type:
Conference
Resource Relation:
Journal Volume: 207; Conference: 8th CCF Conference on High Performance Computing, HPC 2012, Zhangjiajie, China, October 29-31, 2012
Country of Publication:
United States
Language:
English
Subject:
tens of thousands cores; scalability; gpu; mixed precision; finite element method

Citation Formats

Ren, Jiangyong, Wang, ChaoWei, Wang, yingrui, and Tian, Rong. Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: High Performance Computing, HPC 2012. Communications in Computer and Information Science. United States: N. p., 2013. Web. doi:10.1007/978-3-642-41591-3_14.
Ren, Jiangyong, Wang, ChaoWei, Wang, yingrui, & Tian, Rong. Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: High Performance Computing, HPC 2012. Communications in Computer and Information Science. United States. doi:10.1007/978-3-642-41591-3_14.
Ren, Jiangyong, Wang, ChaoWei, Wang, yingrui, and Tian, Rong. Tue . "Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: High Performance Computing, HPC 2012. Communications in Computer and Information Science". United States. doi:10.1007/978-3-642-41591-3_14.
@article{osti_1567533,
title = {Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture. In: High Performance Computing, HPC 2012. Communications in Computer and Information Science},
author = {Ren, Jiangyong and Wang, ChaoWei and Wang, yingrui and Tian, Rong},
abstractNote = {A multi-scale finite element method code, msFEM, is tested on Jaguar and Nebulae, two petaflops computers that were listed as #1 and #2 on the Top500 list of June 2010 at the time of the tests. The flat MPI version of msFEM is scaled from 20K up to 200K CPU cores on Jaguar, delivering 70% parallel efficiency at the 200K cores with a finite element model of eight millions of degrees of freedom. GPU versions, in both double precision and mixed precision coded through MPI+OpenMP+CUDA hybrid programming, 900 GPU nodes on Jaguar and 1500 GPU nodes on Nebulae, achieving remarkable 90 + % parallel efficiency on the systems. The mixed-precision GPU version delivers further 1.5 times of speedup over the fully double precision version with no significant implementational cost. The large-scale tests support that the msFEM runs efficiently on petaflops computers and is highly potential for domain applications at extreme-scale.},
doi = {10.1007/978-3-642-41591-3_14},
journal = {},
number = ,
volume = 207,
place = {United States},
year = {2013},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Designing a New Material World
journal, May 2000


Computational Design of Hierarchically Structured Materials
journal, August 1997


Multiresolution continuum modeling of micro-void assisted dynamic adiabatic shear band propagation
journal, February 2010


Multiresolution analysis for material design
journal, July 2006

  • McVeigh, Cahal; Vernerey, Franck; Liu, Wing Kam
  • Computer Methods in Applied Mechanics and Engineering, Vol. 195, Issue 37-40
  • DOI: 10.1016/j.cma.2005.07.027

An interactive micro-void shear localization mechanism in high strength steels
journal, February 2007

  • Mcveigh, C.; Vernerey, F.; Liu, W.
  • Journal of the Mechanics and Physics of Solids, Vol. 55, Issue 2
  • DOI: 10.1016/j.jmps.2006.08.002

Linking microstructure and properties through a predictive multiresolution continuum
journal, July 2008

  • McVeigh, Cahal; Liu, Wing Kam
  • Computer Methods in Applied Mechanics and Engineering, Vol. 197, Issue 41-42
  • DOI: 10.1016/j.cma.2007.12.020

Multiresolution modeling of ductile reinforced brittle composites
journal, February 2009


A multiresolution continuum simulation of the ductile fracture process
journal, October 2010

  • Tian, Rong; Chan, Stephanie; Tang, Shan
  • Journal of the Mechanics and Physics of Solids, Vol. 58, Issue 10
  • DOI: 10.1016/j.jmps.2010.07.002

On the role of gradients in the localization of deformation and fracture
journal, October 1992


Elastic properties of reinforced solids: Some theoretical principles
journal, September 1963


On Constitutive Macro-Variables for Heterogeneous Solids at Finite Strain
journal, January 1972

  • Hill, R.
  • Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 326, Issue 1565
  • DOI: 10.1098/rspa.1972.0001

Generalized nodes and high-performance elements
journal, January 2005

  • Tian, Rong; Yagawa, Genki
  • International Journal for Numerical Methods in Engineering, Vol. 64, Issue 15
  • DOI: 10.1002/nme.1436

Linear dependence problems of partition of unity-based generalized FEMs
journal, July 2006

  • Tian, Rong; Yagawa, Genki; Terasaka, Haruo
  • Computer Methods in Applied Mechanics and Engineering, Vol. 195, Issue 37-40
  • DOI: 10.1016/j.cma.2005.06.030

A Pu-Based 4-Node Quadratic Tetrahedon and Linear Dependences Elimination in Three-Dimensions
journal, December 2006


Advanced 4-node tetrahedrons
journal, January 2006

  • Tian, Rong; Matsubara, Hitoshi; Yagawa, Genki
  • International Journal for Numerical Methods in Engineering, Vol. 68, Issue 12
  • DOI: 10.1002/nme.1744

Allman's triangle, rotational DOF and partition of unity
journal, January 2006

  • Tian, Rong; Yagawa, Genki
  • International Journal for Numerical Methods in Engineering, Vol. 69, Issue 4
  • DOI: 10.1002/nme.1790

Iterative Refinement in Floating Point
journal, April 1967


Iterative refinement implies numerical stability
journal, September 1977


Implementation of mixed precision in solving systems of linear equations on the Cell processor
journal, January 2007

  • Kurzak, Jakub; Dongarra, Jack
  • Concurrency and Computation: Practice and Experience, Vol. 19, Issue 10
  • DOI: 10.1002/cpe.1164

Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems
journal, November 2007

  • Buttari, Alfredo; Dongarra, Jack; Langou, Julie
  • The International Journal of High Performance Computing Applications, Vol. 21, Issue 4
  • DOI: 10.1177/1094342007084026

Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy
journal, July 2008

  • Buttari, Alfredo; Dongarra, Jack; Kurzak, Jakub
  • ACM Transactions on Mathematical Software, Vol. 34, Issue 4
  • DOI: 10.1145/1377596.1377597

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations
journal, August 2007

  • Göddeke, Dominik; Strzodka, Robert; Turek, Stefan
  • International Journal of Parallel, Emergent and Distributed Systems, Vol. 22, Issue 4
  • DOI: 10.1080/17445760601122076

Design, implementation and testing of extended and mixed precision BLAS
journal, June 2002

  • Li, Xiaoye S.; Martin, Michael C.; Thompson, Brandon J.
  • ACM Transactions on Mathematical Software, Vol. 28, Issue 2
  • DOI: 10.1145/567806.567808

Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU
journal, January 2009

  • Goddeke, Dominik; Wobker, Hilmar; Strzodka, Robert
  • International Journal of Computational Science and Engineering, Vol. 4, Issue 4
  • DOI: 10.1504/IJCSE.2009.029162