skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing fusion PIC code performance at scale on Cori Phase 2

Abstract

In this paper we present the results of optimizing the performance of the gyrokinetic full-f fusion PIC code XGC1 on the Cori Phase Two Knights Landing system. The code has undergone substantial development to enable the use of vector instructions in its most expensive kernels within the NERSC Exascale Science Applications Program. We study the single-node performance of the code on an absolute scale using the roofline methodology to guide optimization efforts. We have obtained 2x speedups in single node performance due to enabling vectorization and performing memory layout optimizations. On multiple nodes, the code is shown to scale well up to 4000 nodes, near half the size of the machine. We discuss some communication bottlenecks that were identified and resolved during the work.

Authors:
;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1398507
DOE Contract Number:
AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: To be determined
Country of Publication:
United States
Language:
English
Subject:
70 PLASMA PHYSICS AND FUSION TECHNOLOGY; 97 MATHEMATICS AND COMPUTING

Citation Formats

Koskela, T. S., and Deslippe, J.. Optimizing fusion PIC code performance at scale on Cori Phase 2. United States: N. p., 2017. Web. doi:10.1007/978-3-319-67630-2_32.
Koskela, T. S., & Deslippe, J.. Optimizing fusion PIC code performance at scale on Cori Phase 2. United States. doi:10.1007/978-3-319-67630-2_32.
Koskela, T. S., and Deslippe, J.. 2017. "Optimizing fusion PIC code performance at scale on Cori Phase 2". United States. doi:10.1007/978-3-319-67630-2_32. https://www.osti.gov/servlets/purl/1398507.
@article{osti_1398507,
title = {Optimizing fusion PIC code performance at scale on Cori Phase 2},
author = {Koskela, T. S. and Deslippe, J.},
abstractNote = {In this paper we present the results of optimizing the performance of the gyrokinetic full-f fusion PIC code XGC1 on the Cori Phase Two Knights Landing system. The code has undergone substantial development to enable the use of vector instructions in its most expensive kernels within the NERSC Exascale Science Applications Program. We study the single-node performance of the code on an absolute scale using the roofline methodology to guide optimization efforts. We have obtained 2x speedups in single node performance due to enabling vectorization and performing memory layout optimizations. On multiple nodes, the code is shown to scale well up to 4000 nodes, near half the size of the machine. We discuss some communication bottlenecks that were identified and resolved during the work.},
doi = {10.1007/978-3-319-67630-2_32},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2017,
month = 7
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization ismore » shown to be the most beneficial optimization path with theoretical yield of up to 8x speedup on KNL. In practice we are able to obtain up to a 4x gain from vectorization due to limitations set by the data layout and memory latency.« less
  • A new highly accurate steam measurement system has been developed for the measurement of steam quality and flow rate. This system consists of a V cone (differential pressure device) and a vortex meter (velocity device) in series, along with temperature and pressure sensors all interfacing with an electronic datalogger. These basic mechanical meters are both rugged and relatively inexpensive for field monitoring purposes. Tests performed in active steam drive projects prove the viability of this system. As a result, every injector in the Coalinga field (around 100 steam injectors) has been economically justified for utilizing this new technology. Ever sincemore » the application of steam for commercial EOR operations, the petroleum industry has been searching for a method to determine both steam quality and flow rate for monitoring, analyzing and optimizing steam drive projects. This paper presents an overview of the new system, the background and development work that has been completed to date, and the reservoir analysis and economic justification required to implement this project.« less
  • GYRO is a code used for the direct numerical simulation of plasma microturbulence. It has been ported to a variety of modern MPP platforms including several modern commodity clusters, IBM SPs, and Cray XC, XT, and XE series machines. We briefly describe the mathematical structure of the equations, the data layout, and the redistribution scheme. Also, while the performance and scaling of GYRO on many of these systems has been shown before, here we show the comparative performance and scaling on four generations of Cray supercomputers including the newest addition - the Cray XC30. The more recently added hybrid OpenMP/MPImore » imple- mentation also shows a great deal of promise on custom HPC systems that utilize fast CPUs and proprietary interconnects. Four machines of varying sizes were used in the experiment, all of which are located at the National Institute for Computational Sciences at the University of Tennessee at Knoxville and Oak Ridge National Laboratory. The advantages, limitations, and performance of using each system are discussed.« less
  • Fluid Catalytic Cracking (FCC) is an important conversion process for the refining industry. The improvement of FCC technology could have a great impact on the public in general by lowering the cost of transportation fuel. A recent review of the FCC technology development by Bienstock et al. of Exxon indicated that the use of computational fluid dynamics (CFD) simulation can be very effective in the advancement of the technology. Theologos and Markatos used a commercial CFD code to model an FCC riser reactor. National Laboratories of the U.S. Department of Energy (DOE) have accumulated immense CFD expertise over the yearsmore » for various engineering applications. A recent DOE survey showed that National Laboratories are using their CFD expertise to help the refinery industry improve the FCC technology under DOE`s Cooperative Research and Development Agreement (CRADA). Among them are Los Alamos National Laboratory with Exxon and Amoco and Argonne National Laboratory (ANL) with Chevron and UOP. This abstract briefly describes the current status of ANL`s work. The objectives of the ANL CRADA work are (1) to use a CFD code to simulate FCC riser reactor flow and (2) to evaluate the impacts of operating conditions and design parameters on the product yields. The CFD code used in this work was originally developed for spray combustion simulation in early 1980 at Argonne. It has been successfully applied to diagnosing a number of multi-phase reacting flow problems in a magneto-hydrodynamic power train. A new version of the CFD code developed for the simulation of the FCC riser flow is called Integral CRacKing FLOw (ICRKFLO). The CFD code solves conservation equations of general flow properties for three phases: gaseous species, liquid droplets, and solid particles. General conservation laws are used in conjunction with rate equations governing the mass, momentum, enthalpy, and species for a multi-phase flow with gas species, liquid droplets, and solid particles.« less