Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations
Abstract
Abstract—We demonstrate the systematic implementation of recentlydeveloped fast explicit kinetic integration algorithms that solve efficiently N coupled ordinary differential equations (subject to initial conditions) on modern GPUs. We take representative test cases (Type Ia supernova explosions) and demonstrate two or more orders of magnitude increase in efficiency for solving such systems (of realistic thermonuclear networks coupled to fluid dynamics). This implies that important coupled, multiphysics problems in various scientific and technical disciplines that were intractable, or could be simulated only with highly schematic kinetic networks, are now computationally feasible. As examples of such applications we present the computational techniques developed for our ongoing deployment of these new methods on modern GPU accelerators. We show that similarly to many other scientific applications, ranging from national security to medical advances, the computation can be split into many independent computational tasks, each of relatively smallsize. As the size of each individual task does not provide sufficient parallelism for the underlying hardware, especially for accelerators, these tasks must be computed concurrently as a single routine, that we call batched routine, in order to saturate the hardware with enough work.
 Authors:
 University of Tennessee (UT)
 University of Tennessee, Knoxville (UTK)
 ORNL
 Publication Date:
 Research Org.:
 Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 1393889
 DOE Contract Number:
 AC0500OR22725
 Resource Type:
 Conference
 Resource Relation:
 Conference: 2016 IEEE High Performance Extreme Computing Conference (HPEC'16)  Waltham, Massachusetts, United States of America  9/13/2016 12:00:00 AM
 Country of Publication:
 United States
 Language:
 English
Citation Formats
Shyles, Daniel, Dongarra, Jack J., Guidry, Mike W., Tomov, Stanimire Z., Billings, Jay Jay, Brock, Benjamin A., and Haidar Ahmad, Azzam A. Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations. United States: N. p., 2016.
Web.
Shyles, Daniel, Dongarra, Jack J., Guidry, Mike W., Tomov, Stanimire Z., Billings, Jay Jay, Brock, Benjamin A., & Haidar Ahmad, Azzam A. Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations. United States.
Shyles, Daniel, Dongarra, Jack J., Guidry, Mike W., Tomov, Stanimire Z., Billings, Jay Jay, Brock, Benjamin A., and Haidar Ahmad, Azzam A. 2016.
"Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations". United States.
doi:. https://www.osti.gov/servlets/purl/1393889.
@article{osti_1393889,
title = {Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations},
author = {Shyles, Daniel and Dongarra, Jack J. and Guidry, Mike W. and Tomov, Stanimire Z. and Billings, Jay Jay and Brock, Benjamin A. and Haidar Ahmad, Azzam A.},
abstractNote = {Abstract—We demonstrate the systematic implementation of recentlydeveloped fast explicit kinetic integration algorithms that solve efficiently N coupled ordinary differential equations (subject to initial conditions) on modern GPUs. We take representative test cases (Type Ia supernova explosions) and demonstrate two or more orders of magnitude increase in efficiency for solving such systems (of realistic thermonuclear networks coupled to fluid dynamics). This implies that important coupled, multiphysics problems in various scientific and technical disciplines that were intractable, or could be simulated only with highly schematic kinetic networks, are now computationally feasible. As examples of such applications we present the computational techniques developed for our ongoing deployment of these new methods on modern GPU accelerators. We show that similarly to many other scientific applications, ranging from national security to medical advances, the computation can be split into many independent computational tasks, each of relatively smallsize. As the size of each individual task does not provide sufficient parallelism for the underlying hardware, especially for accelerators, these tasks must be computed concurrently as a single routine, that we call batched routine, in order to saturate the hardware with enough work.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9
}

Explicit integration with GPU acceleration for large kinetic networks
In this study, we demonstrate the first implementation of recentlydeveloped fast explicit kinetic integration algorithms on modern graphics processing unit (GPU) accelerators. Taking as a generic test case a Type Ia supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve of order 100 realistic kinetic networks in parallel in the same time that standard implicit methods can solve a single such network on a CPU. In addition, this ordersofmagnitude decrease in computation time for solving systems of realistic kinetic networks implies thatmore »Cited by 1 
Explicit integration with GPU acceleration for large kinetic networks
We demonstrate the first implementation of recentlydeveloped fast explicit kinetic integration algorithms on modern graphics processing unit (GPU) accelerators. Taking as a generic test case a Type Ia supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve of order 100 realistic kinetic networks in parallel in the same time that standard implicit methods can solve a single such network on a CPU. This ordersofmagnitude decrease in computation time for solving systems of realistic kinetic networks implies that important coupled, multiphysics problems inmore » 
Sourcejerk analysis using a semiexplicit inverse kinetic technique
A method is proposed for measuring the effective reproduction factor, k, in subcritical systems. The method uses the transient response of a subcritical system to the sudden removal of an extraneous neutron source (i.e., a source jerk). The response is analyzed using an inverse kinetic technique that leastsquares fits the exact analytical solution corresponding to a sourcejerk transient as derived from the pointreactor model. It has been found that the technique can provide an accurate means of measuring k in systems that are close to critical (i.e., 0.95 < k < 1.0). As a system becomes more subcritical (i.e., kmore » 
Computations of laminar flame propagation using an explicit numerical method
Numerical methods are applied to a coupled system of onedimensional unsteady reactiondiffusion equations to seek propagating wave solutions. These equations model flame propagation in certain combustion systems when constant pressure combustion is assumed, with Lewis number unity, and when a Lagrangian coordinate transformation is introduced. In the numerical integration schemes the reaction terms are computed noniteratively, using two different secondorder accurate methods. In the first, the numerical timestep is limited by the Lipschitz timestep constraint as in the case with standard explicit schemes. The second scheme is constructed in such a way that this timestep limitation is not present andmore »