Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.; Krylov, Anna I.

doi:10.2172/1274416

Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends

Technical Report · Tue Jul 26 04:00:00 EDT 2016

DOI:https://doi.org/10.2172/1274416· OSTI ID:1274416

Ibrahim, Khaled Z. ^[1]; Epifanovsky, Evgeny ^[2]; Williams, Samuel W. ^[1]; Krylov, Anna I. ^[3]

Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division
Q-Chem, Inc., Pleasanton, CA (United States)
Univ. of Southern California, Los Angeles, CA (United States). Dept. of Chemistry

Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts to extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)

DOE Contract Number:: AC02-05CH11231

OSTI ID:: 1274416

Report Number(s):: LBNL--1005853; ir:1005853

Country of Publication:: United States

Language:: English

Similar Records

Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

Journal Article · Tue Mar 07 19:00:00 EST 2017 · Journal of Parallel and Distributed Computing · OSTI ID:1379911

A brief summary on formalizing parallel tensor distributions redistributions and algorithm derivations.

Technical Report · Tue Sep 01 00:00:00 EDT 2015 · OSTI ID:1222973

GPU acceleration of the Locally Selfconsistent Multiple Scattering code for first principles calculation of the ground state and statistical physics of materials

Journal Article · Mon Jul 11 20:00:00 EDT 2016 · Computer Physics Communications · OSTI ID:1335344

Related Subjects

74 ATOMIC AND MOLECULAR PHYSICS
97 MATHEMATICS AND COMPUTING

Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends

Citation Formats

Similar Records

Related Subjects