skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on May 13, 2020

Title: Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight

Abstract

Benchmarks for supercomputers are important tools, not only for evaluating and ranking modern supercomputers, but also for providing hints for future architecture design. As a new benchmark, HPGMG (high performance geometric multigrid) solves a linear equation set with a full geometric multi-grid algorithm. It involves computation on different scales, data movement with various volumes, global communication and neighbor communication with both large and small messages, etc., and is more correlated to real world applications than traditional benchmarks such as LINPACK. Therefore, it is desirable to examine how well HPGMG can perform on leadership supercomputers such as Sunway Taihulight. Sunway Taihulight, the No. 1 supercomputer in the Top 500 list from June 2016 to June 2018, which uses a specially designed many-core architecture SW26010, is of great interest to the community of high performance computing. With careful analysis and code design, we came up with an efficient implementation of HPGMG on SW26010 processors. We not only employed traditional optimization techniques such as 2.5D partitioning, double buffering, and collective data load, but also introduced a micro-benchmark to help with the choice of optimization direction and parameter tuning. Another contribution is that we proposed a new procedure for the major operations, by granulatingmore » and reordering the smooth function and the ghost exchange operation, leading to reduced memory copy and accelerated communication process. Our optimized implementation of HPGMG on Sunway TaihuLight achieved a ground-breaking performance of 1.036 × 10 12 Degrees of Freedom per second at the finest level, which is No. 1 on the HPGMG list of Nov 2017.« less

Authors:
; ; ORCiD logo;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1580814
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Cluster Computing
Additional Journal Information:
Journal Name: Cluster Computing; Journal ID: ISSN 1386-7857
Publisher:
Springer
Country of Publication:
United States
Language:
English

Citation Formats

Ma, Wenjing, Ao, Yulong, Yang, Chao, and Williams, Samuel. Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight. United States: N. p., 2019. Web. doi:10.1007/s10586-019-02938-w.
Ma, Wenjing, Ao, Yulong, Yang, Chao, & Williams, Samuel. Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight. United States. doi:10.1007/s10586-019-02938-w.
Ma, Wenjing, Ao, Yulong, Yang, Chao, and Williams, Samuel. Mon . "Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight". United States. doi:10.1007/s10586-019-02938-w.
@article{osti_1580814,
title = {Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight},
author = {Ma, Wenjing and Ao, Yulong and Yang, Chao and Williams, Samuel},
abstractNote = {Benchmarks for supercomputers are important tools, not only for evaluating and ranking modern supercomputers, but also for providing hints for future architecture design. As a new benchmark, HPGMG (high performance geometric multigrid) solves a linear equation set with a full geometric multi-grid algorithm. It involves computation on different scales, data movement with various volumes, global communication and neighbor communication with both large and small messages, etc., and is more correlated to real world applications than traditional benchmarks such as LINPACK. Therefore, it is desirable to examine how well HPGMG can perform on leadership supercomputers such as Sunway Taihulight. Sunway Taihulight, the No. 1 supercomputer in the Top 500 list from June 2016 to June 2018, which uses a specially designed many-core architecture SW26010, is of great interest to the community of high performance computing. With careful analysis and code design, we came up with an efficient implementation of HPGMG on SW26010 processors. We not only employed traditional optimization techniques such as 2.5D partitioning, double buffering, and collective data load, but also introduced a micro-benchmark to help with the choice of optimization direction and parameter tuning. Another contribution is that we proposed a new procedure for the major operations, by granulating and reordering the smooth function and the ghost exchange operation, leading to reduced memory copy and accelerated communication process. Our optimized implementation of HPGMG on Sunway TaihuLight achieved a ground-breaking performance of 1.036 × 10 12 Degrees of Freedom per second at the finest level, which is No. 1 on the HPGMG list of Nov 2017.},
doi = {10.1007/s10586-019-02938-w},
journal = {Cluster Computing},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {5}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on May 13, 2020
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Converting Stencils to Accumulations Forcommunication-Avoiding Optimizationin Geometric Multigrid
conference, January 2014

  • Basu, Protonu; Williams, Samuel; Van Straalen, Brian
  • Proceedings of the Second Workshop on Optimizing Stencil Computations - WOSC '14
  • DOI: 10.1145/2686745.2686749

Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor
conference, August 2017

  • Jiang, Lijuan; Yang, Chao; Ao, Yulong
  • 2017 46th International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2017.51

Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs
journal, November 2016

  • Ma, Wen-Jing; Gao, Kan; Long, Guo-Ping
  • Journal of Computer Science and Technology, Vol. 31, Issue 6
  • DOI: 10.1007/s11390-016-1696-5

High-performance code generation for stencil computations on GPU architectures
conference, January 2012

  • Holewinski, Justin; Pouchet, Louis-Noël; Sadayappan, P.
  • Proceedings of the 26th ACM international conference on Supercomputing - ICS '12
  • DOI: 10.1145/2304576.2304619

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
journal, May 2017


Fast implementation of DGEMM on Fermi GPU
conference, January 2011

  • Tan, Guangming; Li, Linchuan; Triechle, Sean
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
  • DOI: 10.1145/2063384.2063431

The Sunway TaihuLight supercomputer: system and applications
journal, June 2016

  • Fu, Haohuan; Liao, Junfeng; Yang, Jinzhe
  • Science China Information Sciences, Vol. 59, Issue 7
  • DOI: 10.1007/s11432-016-5588-7

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid
conference, December 2013

  • Basu, Protonu; Venkat, Anand; Hall, Mary
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing
  • DOI: 10.1109/HiPC.2013.6799131

Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer
conference, November 2016

  • Zhang, Jian; Zhou, Chunbao; Wang, Yangang
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2016.3

CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system
journal, November 2013


10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics
conference, November 2016

  • Yang, Chao; Xue, Wei; Fu, Haohuan
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2016.5

HPCG and HPGMG benchmark tests on multiple program, multiple data (MPMD) mode on Blue Waters-A Cray XE6/XK7 hybrid system
journal, October 2017

  • Kwack, JaeHyuk; Bauer, Gregory H.
  • Concurrency and Computation: Practice and Experience, Vol. 30, Issue 1
  • DOI: 10.1002/cpe.4298

The LINPACK Benchmark: past, present and future
journal, January 2003

  • Dongarra, Jack J.; Luszczek, Piotr; Petitet, Antoine
  • Concurrency and Computation: Practice and Experience, Vol. 15, Issue 9
  • DOI: 10.1002/cpe.728

A parallel pattern for iterative stencil + reduce
journal, September 2016


Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight
conference, January 2017

  • Fu, Haohuan; Liu, Weiguo; Wang, Lanning
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
  • DOI: 10.1145/3126908.3126909

High performance stencil code generation with Lift
conference, January 2018

  • Hagedorn, Bastian; Stoltzfus, Larisa; Steuwer, Michel
  • Proceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 2018
  • DOI: 10.1145/3179541.3168824

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
conference, November 2010

  • Nguyen, Anthony; Satish, Nadathur; Chhugani, Jatin
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2010.2

The potential of the cell processor for scientific computing
conference, January 2006

  • Williams, Samuel; Shalf, John; Oliker, Leonid
  • Proceedings of the 3rd conference on Computing frontiers - CF '06
  • DOI: 10.1145/1128022.1128027

High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems
journal, August 2015

  • Dongarra, Jack; Heroux, Michael A.; Luszczek, Piotr
  • The International Journal of High Performance Computing Applications, Vol. 30, Issue 1
  • DOI: 10.1177/1094342015593158

A framework for enhancing data reuse via associative reordering
conference, January 2013

  • Stock, Kevin; Kong, Martin; Grosser, Tobias
  • Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI '14
  • DOI: 10.1145/2594291.2594342

Implementing Molecular Dynamics Simulation on Sunway TaihuLight System
conference, December 2016

  • Dong, Wenqian; Kang, Letian; Quan, Zhe
  • 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • DOI: 10.1109/HPCC-SmartCity-DSS.2016.0070

18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios
conference, January 2017

  • Fu, Haohuan; Yin, Wanwang; Yang, Guangwen
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
  • DOI: 10.1145/3126908.3126910