A Communication-Optimal Framework for Contracting Distributed Tensors
Tensor contractions are extremely compute intensive generalized matrix multiplication operations encountered in many computational science fields, such as quantum chemistry and nuclear physics. Unlike distributed matrix multiplication, which has been extensively studied, limited work has been done in understanding distributed tensor contractions. In this paper, we characterize distributed tensor contraction algorithms on torus networks. We develop a framework with three fundamental communication operators to generate communication-efficient contraction algorithms for arbitrary tensor contractions. We show that for a given amount of memory per processor, our framework is communication optimal for all tensor contractions. We demonstrate performance and scalability of our framework on up to 262,144 cores of BG/Q supercomputer using five tensor contraction examples.
- Publication Date:
- OSTI Identifier:
- Report Number(s):
- DOE Contract Number:
- Resource Type:
- Resource Relation:
- Conference: International Conference for High Performance Computing, Storage and Analysis (SC14), November 16-21, 2014, New Orleans, Louisiana, 375-386
- IEEE, Piscataway, NJ, United States(US).
- Research Org:
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
- Sponsoring Org:
- Country of Publication:
- United States
- communication efficiency; tensor contractions; distributed memory