HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Dongarra, Jack; Gates, Mark; Haidar, Azzam; Jia, Yulu; Kabir, Khairul; Luszczek, Piotr; Tomov, Stanimire

doi:10.1155/2015/502593

Title: HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Journal Article · Thu Jan 01 00:00:00 EST 2015 · Scientific Programming

DOI:https://doi.org/10.1155/2015/502593· OSTI ID:1361290

Dongarra, Jack ^[1]; Gates, Mark ^[2]; Haidar, Azzam ^[2]; Jia, Yulu ^[2]; Kabir, Khairul ^[2]; Luszczek, Piotr ^[2]; Tomov, Stanimire ^[2]

Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)
Univ. of Tennessee, Knoxville, TN (United States)

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE; National Science Foundation (NSF); Intel Science and Technology Center (ISTC) for Big Data (United States); Russian Scientific Fund (Russian Federation)

Contributing Organization:: Univ. of Manchester (United Kingdom)

Grant/Contract Number:: AC05-00OR22725; ACI-1339822; N14-11-00190

OSTI ID:: 1361290

Journal Information:: Scientific Programming, Vol. 2015; ISSN 1058-9244

Publisher:: HindawiCopyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 14 works

Citation information provided by
Web of Science

References (7)

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2 https://doi.org/10.1002/cpe.1631	journal	November 2010
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects Agullo, Emmanuel; Demmel, Jim; Dongarra, Jack Journal of Physics: Conference Series, Vol. 180 https://doi.org/10.1088/1742-6596/180/1/012037	journal	July 2009
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming Du, Peng; Weber, Rick; Luszczek, Piotr Parallel Computing, Vol. 38, Issue 8 https://doi.org/10.1016/j.parco.2011.10.002	journal	August 2012
Cilk: an efficient multithreaded runtime system Blumofe, Robert D.; Joerg, Christopher F.; Kuszmaul, Bradley C. ACM SIGPLAN Notices, Vol. 30, Issue 8 https://doi.org/10.1145/209937.209958	journal	August 1995
A framework for argument-based task synchronization with automatic detection of dependencies González, Carlos H.; Fraguela, Basilio B. Parallel Computing, Vol. 39, Issue 9 https://doi.org/10.1016/j.parco.2013.04.012	journal	September 2013
Jade: a high-level, machine-independent language for parallel programming Rinard, M. C.; Scales, D. J.; Lam, M. S. Computer, Vol. 26, Issue 6 https://doi.org/10.1109/2.214440	journal	June 1993
Cilk: An Efficient Multithreaded Runtime System Blumofe, Robert D.; Joerg, Christopher F.; Kuszmaul, Bradley C. Journal of Parallel and Distributed Computing, Vol. 37, Issue 1 https://doi.org/10.1006/jpdc.1996.0107	journal	August 1996

Cited By (3)

Stream Processing on Hybrid CPU/Intel® Xeon Phi™ Systems Ferrão, Paulo; Marques, Hélder; Paulino, Hervé Euro-Par 2018: Parallel Processing https://doi.org/10.1007/978-3-319-96983-1_56	book	January 2018
Toward a BLAS library truly portable across different accelerator types Rodriguez-Gutiez, Eduardo; Moreton-Fernandez, Ana; Gonzalez-Escribano, Arturo The Journal of Supercomputing, Vol. 75, Issue 11 https://doi.org/10.1007/s11227-019-02925-3	journal	June 2019
Solving dense symmetric indefinite systems using GPUs Baboulin, Marc; Dongarra, Jack; Rémy, Adrien Concurrency and Computation: Practice and Experience, Vol. 29, Issue 9 https://doi.org/10.1002/cpe.4055	journal	March 2017

Similar Records

Batched matrix computations on hardware accelerators based on GPUs

Journal Article · Mon Feb 09 00:00:00 EST 2015 · International Journal of High Performance Computing Applications · OSTI ID:1361290

Haidar, Azzam; Dong, Tingxing; Luszczek, Piotr; +2 more

Investigation of Portable Event-Based Monte Carlo Transport Using the NVIDIA Thrust Library

Journal Article · Wed Jun 15 00:00:00 EDT 2016 · Transactions of the American Nuclear Society · OSTI ID:1361290

Bleile, Ryan C.; Brantley, Patrick S.; Dawson, Shawn A.; +2 more

Algorithmic Improvements for Portable Event-Based Monte Carlo Transport Using the Nvidia Thrust Library

Journal Article · Fri Jul 01 00:00:00 EDT 2016 · Transactions of the American Nuclear Society · OSTI ID:1361290

Bleile, Ryan C.; Brantley, Patrick S.; O'Brien, Matthew J.; +1 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Citation Formats

References (7)

Cited By (3)

Similar Records

Related Subjects