Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs
- Ohio State University
- BATTELLE (PACIFIC NW LAB)
Tensor contractions are higher dimensional analogs of matrix multiplications, used in many computational contexts such as high order models in quantum chemistry, deep learning, nite element methods etc. In contrast to the wide availability of high-performance libraries for matrix multiplication on GPUs, the same is not true for tensor contractions. In this paper, we address the optimization of a set of symmetrized tensor contractions that form the computational bottleneck in the CCSD(T) coupled-cluster method in computational chemistry suites like NWChem. Some of the challenges in optimizing tensor contractions that arise in practice from the variety of dimensionalities and shapes for tensors include effective mapping of the high-dimensional iteration space to threads, choice of data buffering in shared-memory and registers, and tile sizes for multi-level tiling. Furthermore, in the case of symmetrized tensor contractions in CCSD(T), it is also a challenge to fuse contractions to reduce data movement cost by exploiting reuse of intermediate tensors. In this paper, we developed an efficient GPU implementation of the tensor contractions in CCSD(T) using shared-memory buffering, register tiling, loop fusion and register transpose. Experimental results demonstrate significant improvement over the current state-of-the-art.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1572874
- Report Number(s):
- PNNL-SA-134103
- Country of Publication:
- United States
- Language:
- English
Similar Records
Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples
Acceleration of Streamed Tensor Contraction Expressions on GPGPU-based Clusters