The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems

Gioiosa, Roberto; Mutlu, Burcu; Lee, Seyong; Vetter, Jeffrey; Picierro, Giulio; Cesati, Marco

doi:10.1145/3366428.3380770

Title: The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems

Conference · Sat Feb 01 00:00:00 EST 2020

DOI:https://doi.org/10.1145/3366428.3380770· OSTI ID:1669742

Gioiosa, Roberto ^[1]; Mutlu, Burcu ^[1];

^[2];

^[2]; Picierro, Giulio ^[3]; Cesati, Marco ^[3]

Pacific Northwest National Laboratory (PNNL)
ORNL
University of Rome Tor Vergata, Italy

Hardware specialization has become the silver bullet to achieve efficient high performance, from Systems-on-Chip systems, where hardware specialization can be "extreme", to large-scale HPC systems. As the complexity of the systems increases, so does the complexity of programming such architectures in a portable way.This work introduces the Minos Computing Library (MCL), as system software, programming model, and programming model runtime that facilitate programming extremely heterogeneous systems. MCL supports the execution of several multi-threaded applications within the same compute node, performs asynchronous execution of application tasks, efficiently balances computation across hardware resources, and provides performance portability.We show that code developed on a personal desktop automatically scales up to fully utilize powerful workstations with 8 GPUs and down to power-efficient embedded systems. MCL provides up to 17.5x speedup over OpenCL on NVIDIA DGX-1 systems and up to 1.88x speedup on single-GPU systems. In multi-application workloads, MCL's dynamic resource allocation provides up to 2.43x performance improvement over manual, static resources allocation.

View Conference

Cite

Export

Save

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1669742

Resource Relation:: Conference: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit (GPGPU) - San Diego, California, United States of America - 2/23/2020 10:00:00 AM-2/23/2020 10:00:00 AM

Country of Publication:: United States

Language:: English

References (19)

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures Augonnet, Cédric; Thibault, Samuel; Namyst, Raymond Concurrency and Computation: Practice and Experience, Vol. 23, Issue 2 https://doi.org/10.1002/cpe.1631	journal	November 2010
FPGA programming for the masses Bacon, David F.; Rabbah, Rodric; Shukla, Sunil Communications of the ACM, Vol. 56, Issue 4 https://doi.org/10.1145/2436256.2436271	journal	April 2013
Legion: Expressing locality and independence with logical regions Bauer, Michael; Treichler, Sean; Slaughter, Elliott 2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2012.71	conference	November 2012
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures Belviranli, Mehmet E.; Bhuyan, Laxmi N.; Gupta, Rajiv ACM Transactions on Architecture and Code Optimization, Vol. 9, Issue 4 https://doi.org/10.1145/2400682.2400716	journal	January 2013
Productive Programming of GPU Clusters with OmpSs Bueno, Javier; Planas, Judit; Duran, Alejandro 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.58	conference	May 2012
OpenMP: an industry standard API for shared-memory programming Dagum, L.; Menon, R. IEEE Computational Science and Engineering, Vol. 5, Issue 1 https://doi.org/10.1109/99.660313	journal	January 1998
pocl: A Performance-Portable OpenCL Implementation Jääskeläinen, Pekka; de La Lama, Carlos Sánchez; Schnetter, Erik International Journal of Parallel Programming, Vol. 43, Issue 5 https://doi.org/10.1007/s10766-014-0320-y	journal	August 2014
CHARM++: a portable concurrent object oriented system based on C++ Kale, Laxmikant V.; Krishnan, Sanjeev ACM SIGPLAN Notices, Vol. 28, Issue 10 https://doi.org/10.1145/167962.165874	journal	October 1993
Achieving a single compute device image in OpenCL for multiple GPUs Kim, Jungwon; Kim, Honggyu; Lee, Joo Hwan Proceedings of the 16th ACM symposium on Principles and practice of parallel programming https://doi.org/10.1145/1941553.1941591	conference	February 2011
Hpvm Kotsifakou, Maria; Srivastava, Prakalp; Sinclair, Matthew D. Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3178487.3178493	conference	February 2018
A Survey of CPU-GPU Heterogeneous Computing Techniques Mittal, Sparsh; Vetter, Jeffrey S. ACM Computing Surveys, Vol. 47, Issue 4 https://doi.org/10.1145/2788396	journal	July 2015
GPU parallel computing architecture and CUDA programming model Nickolls, John 2007 IEEE Hot Chips 19 Symposium (HCS) https://doi.org/10.1109/HOTCHIPS.2007.7482491	conference	August 2007
Self-Adaptive OmpSs Tasks in Heterogeneous Environments Planas, Judit; Badia, Rosa M.; Ayguade, Eduard 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing https://doi.org/10.1109/IPDPS.2013.53	conference	May 2013
Semi-automatic restructuring of offloadable tasks for many-core accelerators Ravi, Nishkam; Yang, Yi; Bao, Tao Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/2503210.2503285	conference	November 2013
PTask: operating system abstractions to manage GPUs as compute devices Rossbach, Christopher J.; Currey, Jon; Silberstein, Mark Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles - SOSP '11 https://doi.org/10.1145/2043556.2043579	conference	January 2011
Dandelion Rossbach, Christopher J.; Yu, Yuan; Currey, Jon Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles https://doi.org/10.1145/2517349.2522715	conference	November 2013
Heterogeneous Task Scheduling for Accelerated OpenMP Scogland, Thomas R. W.; Rountree, Barry; Feng, Wu-chun 2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2012 IEEE 26th International Parallel and Distributed Processing Symposium https://doi.org/10.1109/IPDPS.2012.23	conference	May 2012
Maestro: Data Orchestration and Tuning for OpenCL Devices Spafford, Kyle; Meredith, Jeremy; Vetter, Jeffrey Euro-Par 2010 - Parallel Processing https://doi.org/10.1007/978-3-642-15291-7_26	book	January 2010
Extreme Heterogeneity 2018 - Productive Computational Science in the Era of Extreme Heterogeneity: Report for DOE ASCR Workshop on Extreme Heterogeneity Vetter, Jeffrey S.; Brightwell, Ron; Gokhale, Maya https://doi.org/10.2172/1473756	report	December 2018

Similar Records

The Minos Computing Library: Efficient Parallel Programming for Extremely Heterogeneous Systems

Conference · Mon Mar 02 00:00:00 EST 2020 · OSTI ID:1669742

Gioiosa, Roberto; Mutlu, Burcu; Lee, Seyong; +3 more

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures

Conference · Mon Aug 09 00:00:00 EDT 2021 · OSTI ID:1669742

Xie, Chenhao; Chen, Jieyang; Firoz, Jesun S.; +5 more

Locality-Aware Scheduling for Scalable Heterogeneous Environments

Conference · Tue Dec 29 00:00:00 EST 2020 · OSTI ID:1669742

Kamatar, Alok V.; Friese, Ryan D.; Gioiosa, Roberto

Title: The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems

Citation Formats

References (19)

Similar Records

Related Subjects