skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

Abstract

The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this paper, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC’s key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3–4.7× on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Finally, our work also points to severalmore » architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.« less

Authors:
 [1];  [2];  [1];  [1];  [3];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  2. Kookmin Univ., Seoul (Korea, Republic of)
  3. Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21); USDOE Office of Science (SC), Fusion Energy Sciences (FES) (SC-24); Microsoft Corporation (United States); Intel Corporation (United States); National Research Foundation of Korea (NRF)
OSTI Identifier:
1407105
Grant/Contract Number:  
AC02-05CH11231; AC02-09CH11466; 024263; 024894; 2009-0083600; 2010-0003044
Resource Type:
Accepted Manuscript
Journal Name:
Parallel Computing
Additional Journal Information:
Journal Volume: 37; Journal Issue: 9; Journal ID: ISSN 0167-8191
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; particle-in-cell; multicore; manycore; code optimization; graphic processing units; Fermi

Citation Formats

Madduri, Kamesh, Im, Eun-Jin, Ibrahim, Khaled Z., Williams, Samuel, Ethier, Stéphane, and Oliker, Leonid. Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. United States: N. p., 2011. Web. doi:10.1016/j.parco.2011.02.001.
Madduri, Kamesh, Im, Eun-Jin, Ibrahim, Khaled Z., Williams, Samuel, Ethier, Stéphane, & Oliker, Leonid. Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. United States. doi:10.1016/j.parco.2011.02.001.
Madduri, Kamesh, Im, Eun-Jin, Ibrahim, Khaled Z., Williams, Samuel, Ethier, Stéphane, and Oliker, Leonid. Wed . "Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms". United States. doi:10.1016/j.parco.2011.02.001. https://www.osti.gov/servlets/purl/1407105.
@article{osti_1407105,
title = {Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms},
author = {Madduri, Kamesh and Im, Eun-Jin and Ibrahim, Khaled Z. and Williams, Samuel and Ethier, Stéphane and Oliker, Leonid},
abstractNote = {The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this paper, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC’s key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3–4.7× on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Finally, our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.},
doi = {10.1016/j.parco.2011.02.001},
journal = {Parallel Computing},
number = 9,
volume = 37,
place = {United States},
year = {2011},
month = {3}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 20 works
Citation information provided by
Web of Science

Save / Share: