skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale, In: ISC High Performance 2016: High Performance Computing

Abstract

Time-dependent deterministic discrete ordinates transport codes are an important class of application which provide significant challenges for large, many-core systems. One such challenge is the large memory capacity needed by the solve step, which requires us to have a scalable solution in order to have enough node-level memory to store all the data. In our previous work, we demonstrated the first implementation which showed a significant performance benefit for single node solves using GPUs. In this paper we extend our work to large problems and demonstrate the scalability of our solution on two Petascale GPU-based supercomputers: Titan at Oak Ridge and Piz Daint at CSCS. Our results show that our improved node-level parallelism scheme scales just as well across large systems as previous approaches when using the tried and tested KBA domain decomposition technique. We validate our results against an improved performance model which predicts the runtime of the main ‘sweep’ routine when running on different hardware, including CPUs or GPUs.

Authors:
 [1];  [1];  [2]
  1. Department of Computer Science, University of Bristol, Bristol, UK
  2. High Performance Computing, UK Atomic Weapons Establishment, Aldermaston, UK
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1567406
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Journal Name:
HIGH PERFORMANCE COMPUTING
Additional Journal Information:
Journal Volume: 9697; Conference: International Conference on High Performance Computing, Frankfurt, Germany, June 19-23, 2016
Publisher:
Springer International Publishing Switzerland 2016
Country of Publication:
United States
Language:
English
Subject:
Computer Science

Citation Formats

Deakin, Tom, McIntosh-Smith, Simon, and Gaudin, Wayne. Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale, In: ISC High Performance 2016: High Performance Computing. United States: N. p., 2016. Web. doi:10.1007/978-3-319-41321-1_22.
Deakin, Tom, McIntosh-Smith, Simon, & Gaudin, Wayne. Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale, In: ISC High Performance 2016: High Performance Computing. United States. doi:10.1007/978-3-319-41321-1_22.
Deakin, Tom, McIntosh-Smith, Simon, and Gaudin, Wayne. Fri . "Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale, In: ISC High Performance 2016: High Performance Computing". United States. doi:10.1007/978-3-319-41321-1_22.
@article{osti_1567406,
title = {Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale, In: ISC High Performance 2016: High Performance Computing},
author = {Deakin, Tom and McIntosh-Smith, Simon and Gaudin, Wayne},
abstractNote = {Time-dependent deterministic discrete ordinates transport codes are an important class of application which provide significant challenges for large, many-core systems. One such challenge is the large memory capacity needed by the solve step, which requires us to have a scalable solution in order to have enough node-level memory to store all the data. In our previous work, we demonstrated the first implementation which showed a significant performance benefit for single node solves using GPUs. In this paper we extend our work to large problems and demonstrate the scalability of our solution on two Petascale GPU-based supercomputers: Titan at Oak Ridge and Piz Daint at CSCS. Our results show that our improved node-level parallelism scheme scales just as well across large systems as previous approaches when using the tried and tested KBA domain decomposition technique. We validate our results against an improved performance model which predicts the runtime of the main ‘sweep’ routine when running on different hardware, including CPUs or GPUs.},
doi = {10.1007/978-3-319-41321-1_22},
journal = {HIGH PERFORMANCE COMPUTING},
number = ,
volume = 9697,
place = {United States},
year = {2016},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

An S n Algorithm for the Massively Parallel CM-200 Computer
journal, March 1998

  • Baker, Randal S.; Koch, Kenneth R.
  • Nuclear Science and Engineering, Vol. 128, Issue 3
  • DOI: 10.13182/NSE98-1

Denovo: A New Three-Dimensional Parallel Discrete Ordinates Code in SCALE
journal, August 2010

  • Evans, Thomas M.; Stafford, Alissa S.; Slaybaugh, Rachel N.
  • Nuclear Technology, Vol. 171, Issue 2
  • DOI: 10.13182/NT171-171

Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications
journal, November 2000

  • Hoisie, Adolfy; Lubeck, Olaf; Wasserman, Harvey
  • The International Journal of High Performance Computing Applications, Vol. 14, Issue 4
  • DOI: 10.1177/109434200001400405

On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures
journal, July 2011

  • Pennycook, S. J.; Hammond, S. D.; Mudalige, G. R.
  • The Computer Journal, Vol. 55, Issue 2
  • DOI: 10.1093/comjnl/bxr073