skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight

Journal Article · · Cluster Computing
 [1];  [2]; ORCiD logo [2];  [3]
  1. Chinese Academy of Sciences (CAS), Beijing (China)
  2. Peking Univ., Beijing (China); Peng Cheng Lab., Shenzhen (China)
  3. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Benchmarks for supercomputers are important tools, not only for evaluating and ranking modern supercomputers, but also for providing hints for future architecture design. As a new benchmark, HPGMG (high performance geometric multigrid) solves a linear equation set with a full geometric multi-grid algorithm. It involves computation on different scales, data movement with various volumes, global communication and neighbor communication with both large and small messages, etc., and is more correlated to real world applications than traditional benchmarks such as LINPACK. Therefore, it is desirable to examine how well HPGMG can perform on leadership supercomputers such as Sunway Taihulight. Sunway Taihulight, the No. 1 supercomputer in the Top 500 list from June 2016 to June 2018, which uses a specially designed many-core architecture SW26010, is of great interest to the community of high performance computing. With careful analysis and code design, we came up with an efficient implementation of HPGMG on SW26010 processors. We not only employed traditional optimization techniques such as 2.5D partitioning, double buffering, and collective data load, but also introduced a micro-benchmark to help with the choice of optimization direction and parameter tuning. Another contribution is that we proposed a new procedure for the major operations, by granulating and reordering the smooth function and the ghost exchange operation, leading to reduced memory copy and accelerated communication process. Our optimized implementation of HPGMG on Sunway TaihuLight achieved a ground-breaking performance of 1.036 × 1012 Degrees of Freedom per second at the finest level, which is No. 1 on the HPGMG list of Nov 2017.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); National Key R&D Plan of China; Beijing Natural Science Foundation
Grant/Contract Number:
AC02-05CH11231; 2016YFB0200603; JQ18001
OSTI ID:
1580814
Journal Information:
Cluster Computing, Vol. 23, Issue 2; ISSN 1386-7857
Publisher:
SpringerCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 2 works
Citation information provided by
Web of Science

References (25)

Converting Stencils to Accumulations Forcommunication-Avoiding Optimizationin Geometric Multigrid conference January 2014
Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor conference August 2017
Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs journal November 2016
High-performance code generation for stencil computations on GPU architectures conference January 2012
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers journal May 2017
Fast implementation of DGEMM on Fermi GPU
  • Tan, Guangming; Li, Linchuan; Triechle, Sean
  • Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11 https://doi.org/10.1145/2063384.2063431
conference January 2011
The Sunway TaihuLight supercomputer: system and applications journal June 2016
Compiler generation and autotuning of communication-avoiding operators for geometric multigrid
  • Basu, Protonu; Venkat, Anand; Hall, Mary
  • 2013 20th International Conference on High Performance Computing (HiPC), 20th Annual International Conference on High Performance Computing https://doi.org/10.1109/HiPC.2013.6799131
conference December 2013
Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer
  • Zhang, Jian; Zhou, Chunbao; Wang, Yangang
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.3
conference November 2016
CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system journal November 2013
10M-Core Scalable Fully-Implicit Solver for Nonhydrostatic Atmospheric Dynamics
  • Yang, Chao; Xue, Wei; Fu, Haohuan
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2016.5
conference November 2016
HPCG and HPGMG benchmark tests on multiple program, multiple data (MPMD) mode on Blue Waters-A Cray XE6/XK7 hybrid system journal October 2017
The LINPACK Benchmark: past, present and future
  • Dongarra, Jack J.; Luszczek, Piotr; Petitet, Antoine
  • Concurrency and Computation: Practice and Experience, Vol. 15, Issue 9 https://doi.org/10.1002/cpe.728
journal January 2003
A parallel pattern for iterative stencil + reduce journal September 2016
Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight
  • Fu, Haohuan; Liu, Weiguo; Wang, Lanning
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126909
conference January 2017
High performance stencil code generation with Lift conference January 2018
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
  • Nguyen, Anthony; Satish, Nadathur; Chhugani, Jatin
  • 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2010.2
conference November 2010
The potential of the cell processor for scientific computing conference January 2006
High performance stencil code generation with Lift
  • Hagedorn, Bastian; Stoltzfus, Larisa; Steuwer, Michel
  • Proceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 2018 https://doi.org/10.1145/3168824
conference January 2018
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems journal August 2015
A framework for enhancing data reuse via associative reordering conference January 2013
Implementing Molecular Dynamics Simulation on Sunway TaihuLight System
  • Dong, Wenqian; Kang, Letian; Quan, Zhe
  • 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.0070
conference December 2016
18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios
  • Fu, Haohuan; Yin, Wanwang; Yang, Guangwen
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17 https://doi.org/10.1145/3126908.3126910
conference January 2017
A framework for enhancing data reuse via associative reordering journal June 2014
A parallel pattern for iterative stencil + reduce text January 2016

Similar Records

HPGMG
Software · Mon Mar 10 00:00:00 EDT 2014 · OSTI ID:1580814

The accurate particle tracer code
Journal Article · Thu Jul 20 00:00:00 EDT 2017 · Computer Physics Communications · OSTI ID:1580814

Random circuit block-encoded matrix and a proposal of quantum LINPACK benchmark
Journal Article · Mon Jun 14 00:00:00 EDT 2021 · Physical Review A · OSTI ID:1580814