# Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations

## Abstract

Abstract—We demonstrate the systematic implementation of recently-developed fast explicit kinetic integration algorithms that solve efficiently N coupled ordinary differential equations (subject to initial conditions) on modern GPUs. We take representative test cases (Type Ia supernova explosions) and demonstrate two or more orders of magnitude increase in efficiency for solving such systems (of realistic thermonuclear networks coupled to fluid dynamics). This implies that important coupled, multiphysics problems in various scientific and technical disciplines that were intractable, or could be simulated only with highly schematic kinetic networks, are now computationally feasible. As examples of such applications we present the computational techniques developed for our ongoing deployment of these new methods on modern GPU accelerators. We show that similarly to many other scientific applications, ranging from national security to medical advances, the computation can be split into many independent computational tasks, each of relatively small-size. As the size of each individual task does not provide sufficient parallelism for the underlying hardware, especially for accelerators, these tasks must be computed concurrently as a single routine, that we call batched routine, in order to saturate the hardware with enough work.

- Authors:

- University of Tennessee (UT)
- University of Tennessee, Knoxville (UTK)
- ORNL

- Publication Date:

- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

- Sponsoring Org.:
- USDOE

- OSTI Identifier:
- 1393889

- DOE Contract Number:
- AC05-00OR22725

- Resource Type:
- Conference

- Resource Relation:
- Conference: 2016 IEEE High Performance Extreme Computing Conference (HPEC'16) - Waltham, Massachusetts, United States of America - 9/13/2016 12:00:00 AM-

- Country of Publication:
- United States

- Language:
- English

### Citation Formats

```
Shyles, Daniel, Dongarra, Jack J., Guidry, Mike W., Tomov, Stanimire Z., Billings, Jay Jay, Brock, Benjamin A., and Haidar Ahmad, Azzam A..
```*Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations*. United States: N. p., 2016.
Web.

```
Shyles, Daniel, Dongarra, Jack J., Guidry, Mike W., Tomov, Stanimire Z., Billings, Jay Jay, Brock, Benjamin A., & Haidar Ahmad, Azzam A..
```*Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations*. United States.

```
Shyles, Daniel, Dongarra, Jack J., Guidry, Mike W., Tomov, Stanimire Z., Billings, Jay Jay, Brock, Benjamin A., and Haidar Ahmad, Azzam A.. Thu .
"Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations". United States.
doi:. https://www.osti.gov/servlets/purl/1393889.
```

```
@article{osti_1393889,
```

title = {Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations},

author = {Shyles, Daniel and Dongarra, Jack J. and Guidry, Mike W. and Tomov, Stanimire Z. and Billings, Jay Jay and Brock, Benjamin A. and Haidar Ahmad, Azzam A.},

abstractNote = {Abstract—We demonstrate the systematic implementation of recently-developed fast explicit kinetic integration algorithms that solve efficiently N coupled ordinary differential equations (subject to initial conditions) on modern GPUs. We take representative test cases (Type Ia supernova explosions) and demonstrate two or more orders of magnitude increase in efficiency for solving such systems (of realistic thermonuclear networks coupled to fluid dynamics). This implies that important coupled, multiphysics problems in various scientific and technical disciplines that were intractable, or could be simulated only with highly schematic kinetic networks, are now computationally feasible. As examples of such applications we present the computational techniques developed for our ongoing deployment of these new methods on modern GPU accelerators. We show that similarly to many other scientific applications, ranging from national security to medical advances, the computation can be split into many independent computational tasks, each of relatively small-size. As the size of each individual task does not provide sufficient parallelism for the underlying hardware, especially for accelerators, these tasks must be computed concurrently as a single routine, that we call batched routine, in order to saturate the hardware with enough work.},

doi = {},

journal = {},

number = ,

volume = ,

place = {United States},

year = {Thu Sep 01 00:00:00 EDT 2016},

month = {Thu Sep 01 00:00:00 EDT 2016}

}