Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

Dong, Fengguang; Tomov, Stanimire; Dongarra, Jack

doi:10.2172/1173287

Title: Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

Technical Report · Wed Jun 01 00:00:00 EDT 2011

DOI:https://doi.org/10.2172/1173287· OSTI ID:1173287

Dong, Fengguang ^[1]; Tomov, Stanimire ^[1]; Dongarra, Jack ^[2]

Univ. of Tennessee, Knoxville, TN (United States)
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations e ciently. Our approach is able to achieve the objectives of a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine, and to use a heterogeneous 1-D block cyclic distribution to allocate data to the host system and GPUs to minimize communication. We have designed heterogeneous algorithms with two di erent tile sizes (one for CPU cores and the other for GPUs) to cope with processor heterogeneity. We propose an auto-tuning method to determine the best tile sizes to attain both high performance and load balancing. We have also implemented a new runtime system and applied it to the Cholesky and QR factorizations. Our experiments on a compute node with two Intel Westmere hexa-core CPUs and three Nvidia Fermi GPUs demonstrate good weak scalability, strong scalability, load balance, and e ciency of our approach.

View Technical Report

Cite

Export

Save

Research Organization:: Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: AC02-05CH11231

OSTI ID:: 1173287

Report Number(s):: LBNL-5783E

Country of Publication:: United States

Language:: English

Similar Records

A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

Journal Article · Wed Oct 01 00:00:00 EDT 2014 · Concurrency and Computation. Practice and Experience · OSTI ID:1173287

Song, Fengguang; Dongarra, Jack

Batched matrix computations on hardware accelerators based on GPUs

Journal Article · Mon Feb 09 00:00:00 EST 2015 · International Journal of High Performance Computing Applications · OSTI ID:1173287

Haidar, Azzam; Dong, Tingxing; Luszczek, Piotr; +2 more

Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Journal Article · Mon Dec 01 00:00:00 EST 2014 · Journal of Computational Physics · OSTI ID:1173287

Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; +9 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

Citation Formats

Similar Records

Related Subjects