The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems
- Pacific Northwest National Laboratory (PNNL)
- ORNL
- University of Rome Tor Vergata, Italy
Hardware specialization has become the silver bullet to achieve efficient high performance, from Systems-on-Chip systems, where hardware specialization can be "extreme", to large-scale HPC systems. As the complexity of the systems increases, so does the complexity of programming such architectures in a portable way.This work introduces the Minos Computing Library (MCL), as system software, programming model, and programming model runtime that facilitate programming extremely heterogeneous systems. MCL supports the execution of several multi-threaded applications within the same compute node, performs asynchronous execution of application tasks, efficiently balances computation across hardware resources, and provides performance portability.We show that code developed on a personal desktop automatically scales up to fully utilize powerful workstations with 8 GPUs and down to power-efficient embedded systems. MCL provides up to 17.5x speedup over OpenCL on NVIDIA DGX-1 systems and up to 1.88x speedup on single-GPU systems. In multi-application workloads, MCL's dynamic resource allocation provides up to 2.43x performance improvement over manual, static resources allocation.
- Research Organization:
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 1669742
- Resource Relation:
- Conference: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit (GPGPU) - San Diego, California, United States of America - 2/23/2020 10:00:00 AM-2/23/2020 10:00:00 AM
- Country of Publication:
- United States
- Language:
- English
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
|
journal | November 2010 |
FPGA programming for the masses
|
journal | April 2013 |
Legion: Expressing locality and independence with logical regions
|
conference | November 2012 |
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
|
journal | January 2013 |
Productive Programming of GPU Clusters with OmpSs
|
conference | May 2012 |
OpenMP: an industry standard API for shared-memory programming
|
journal | January 1998 |
pocl: A Performance-Portable OpenCL Implementation
|
journal | August 2014 |
CHARM++: a portable concurrent object oriented system based on C++
|
journal | October 1993 |
Achieving a single compute device image in OpenCL for multiple GPUs
|
conference | February 2011 |
Hpvm
|
conference | February 2018 |
A Survey of CPU-GPU Heterogeneous Computing Techniques
|
journal | July 2015 |
GPU parallel computing architecture and CUDA programming model
|
conference | August 2007 |
Self-Adaptive OmpSs Tasks in Heterogeneous Environments
|
conference | May 2013 |
Semi-automatic restructuring of offloadable tasks for many-core accelerators
|
conference | November 2013 |
PTask: operating system abstractions to manage GPUs as compute devices
|
conference | January 2011 |
Dandelion
|
conference | November 2013 |
Heterogeneous Task Scheduling for Accelerated OpenMP
|
conference | May 2012 |
Maestro: Data Orchestration and Tuning for OpenCL Devices
|
book | January 2010 |
Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity | report | December 2018 |
Similar Records
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
Locality-Aware Scheduling for Scalable Heterogeneous Environments