GPU acceleration and performance of the particle-beam-dynamics code Elegant
Abstract
Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and design of a variety of high-energy particle accelerators and accelerator-based systems. We discuss a recently developed version of the code that can take advantage of CUDA-enabled graphics processing units (GPUs) to achieve significantly improved performance for a large class of simulations that are important in practice. The GPU version is largely defined by a framework that simplifies implementations of the fundamental kernel types that are used by Elegant: particle operations, reductions, particle loss, histograms, array convolutions and random number generation. Accelerated performance on the Titan Cray XK-7 supercomputer is approximately 6–10 times better with the GPU than all the CPU cores associated with the same node count. In addition to performance, the maintainability of the GPU-accelerated version of the code was considered a key design objective. Accuracy with respect to the CPU implementation is also a core consideration. Finally, four different methods are used to ensure that the accelerated code faithfully reproduces the CPU results.
- Authors:
-
- Tech-X Corporation, Boulder, CO (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Tech-X Corporation, Boulder, CO (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Basic Energy Sciences (BES)
- OSTI Identifier:
- 1482835
- Alternate Identifier(s):
- OSTI ID: 1635832
- Grant/Contract Number:
- SC0004585; AC05-00OR22725
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Computer Physics Communications
- Additional Journal Information:
- Journal Volume: 235; Journal ID: ISSN 0010-4655
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 43 PARTICLE ACCELERATORS; 97 MATHEMATICS AND COMPUTING; particle-accelerator simulation; GPU acceleration
Citation Formats
King, J. R., Pogorelov, I. V., Amyx, K. M., Borland, M., and Soliday, R. GPU acceleration and performance of the particle-beam-dynamics code Elegant. United States: N. p., 2018.
Web. doi:10.1016/j.cpc.2018.09.022.
King, J. R., Pogorelov, I. V., Amyx, K. M., Borland, M., & Soliday, R. GPU acceleration and performance of the particle-beam-dynamics code Elegant. United States. https://doi.org/10.1016/j.cpc.2018.09.022
King, J. R., Pogorelov, I. V., Amyx, K. M., Borland, M., and Soliday, R. Tue .
"GPU acceleration and performance of the particle-beam-dynamics code Elegant". United States. https://doi.org/10.1016/j.cpc.2018.09.022. https://www.osti.gov/servlets/purl/1482835.
@article{osti_1482835,
title = {GPU acceleration and performance of the particle-beam-dynamics code Elegant},
author = {King, J. R. and Pogorelov, I. V. and Amyx, K. M. and Borland, M. and Soliday, R.},
abstractNote = {Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and design of a variety of high-energy particle accelerators and accelerator-based systems. We discuss a recently developed version of the code that can take advantage of CUDA-enabled graphics processing units (GPUs) to achieve significantly improved performance for a large class of simulations that are important in practice. The GPU version is largely defined by a framework that simplifies implementations of the fundamental kernel types that are used by Elegant: particle operations, reductions, particle loss, histograms, array convolutions and random number generation. Accelerated performance on the Titan Cray XK-7 supercomputer is approximately 6–10 times better with the GPU than all the CPU cores associated with the same node count. In addition to performance, the maintainability of the GPU-accelerated version of the code was considered a key design objective. Accuracy with respect to the CPU implementation is also a core consideration. Finally, four different methods are used to ensure that the accelerated code faithfully reproduces the CPU results.},
doi = {10.1016/j.cpc.2018.09.022},
journal = {Computer Physics Communications},
number = ,
volume = 235,
place = {United States},
year = {Tue Oct 16 00:00:00 EDT 2018},
month = {Tue Oct 16 00:00:00 EDT 2018}
}
Web of Science
Figures / Tables:
Figures / Tables found in this record: