Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Journal Article · · Scientific Programming
DOI:https://doi.org/10.1155/2015/502593· OSTI ID:1361290
 [1];  [2];  [2];  [2];  [2];  [2];  [2]
  1. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom)
  2. Univ. of Tennessee, Knoxville, TN (United States)

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.

Research Organization:
Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE; National Science Foundation (NSF) (United States); Intel Science and Technology Center (ISTC) for Big Data (United States); Russian Scientific Fund (Russian Federation)
Contributing Organization:
Univ. of Manchester (United Kingdom)
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1361290
Journal Information:
Scientific Programming, Journal Name: Scientific Programming Vol. 2015; ISSN 1058-9244
Publisher:
HindawiCopyright Statement
Country of Publication:
United States
Language:
English

References (8)

Cilk: An Efficient Multithreaded Runtime System journal August 1996
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures journal November 2010
Factors Impacting Performance of Multithreaded Sparse Triangular Solve book January 2011
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming journal August 2012
A framework for argument-based task synchronization with automatic detection of dependencies journal September 2013
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects journal July 2009
Jade: a high-level, machine-independent language for parallel programming journal June 1993
Cilk: an efficient multithreaded runtime system journal August 1995

Cited By (3)

Solving dense symmetric indefinite systems using GPUs journal March 2017
Stream Processing on Hybrid CPU/Intel® Xeon Phi™ Systems book January 2018
Toward a BLAS library truly portable across different accelerator types journal June 2019

Similar Records

Efficient Implementation of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor
Conference · Sun Nov 30 23:00:00 EST 2014 · OSTI ID:1178506

The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications
Conference · Fri May 23 00:00:00 EDT 2014 · OSTI ID:1178876

Trinity Benchmarks on the Intel Xeon Phi (Knights Corner)
Technical Report · Wed Dec 31 23:00:00 EST 2014 · OSTI ID:1504115

Related Subjects