DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks

Abstract

Here, we provide an extensive survey on wide spectrum of scheduling methods for multitasking among graphics processing unit (GPU) computing tasks. We then design several schedulers and explain in detail the selected methods we have developed to implement our scheduling strategies. Next, we compare the performance of schedulers on various workloads running on Fermi and Kepler architectures and arrive at the following major conclusions: (1) Small kernels benefit from running kernels concurrently. (2) The combination of small kernels, high-priority kernels with longer runtimes, and lower-priority kernels with shorter runtimes benefits from a CPU scheduler that dynamically changes kernel order on the Fermi architecture. (3) Because of limitations of existing GPU architectures, currently CPU schedulers outperform their GPU counterparts. We also provide results and observations obtained from implementing and evaluating our schedulers on the NVIDIA Jetson TX1 system-on-chip architecture. We observe that although TX1 has the newer Maxwell architecture, the mechanism used for scheduler timings behaves differently on TX1 compared to Kepler leading to incorrect timings. In this paper, we describe our methods that allow us to report correct timings for CPU schedulers running on TX1. Lastly, we propose new research directions involving the investigation of additional scheduling strategies.

Authors:
 [1];  [2]
  1. California State Univ., Sacramento, CA (United States)
  2. Univ. of California, Davis, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1528898
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 29; Journal Issue: 15; Journal ID: ISSN 1532-0626
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; GPU computing; multitasking; real‐time embedded tasks

Citation Formats

Muyan-Özçelik, Pınar, and Owens, John D. Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks. United States: N. p., 2017. Web. doi:10.1002/cpe.4118.
Muyan-Özçelik, Pınar, & Owens, John D. Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks. United States. https://doi.org/10.1002/cpe.4118
Muyan-Özçelik, Pınar, and Owens, John D. Mon . "Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks". United States. https://doi.org/10.1002/cpe.4118. https://www.osti.gov/servlets/purl/1528898.
@article{osti_1528898,
title = {Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks},
author = {Muyan-Özçelik, Pınar and Owens, John D.},
abstractNote = {Here, we provide an extensive survey on wide spectrum of scheduling methods for multitasking among graphics processing unit (GPU) computing tasks. We then design several schedulers and explain in detail the selected methods we have developed to implement our scheduling strategies. Next, we compare the performance of schedulers on various workloads running on Fermi and Kepler architectures and arrive at the following major conclusions: (1) Small kernels benefit from running kernels concurrently. (2) The combination of small kernels, high-priority kernels with longer runtimes, and lower-priority kernels with shorter runtimes benefits from a CPU scheduler that dynamically changes kernel order on the Fermi architecture. (3) Because of limitations of existing GPU architectures, currently CPU schedulers outperform their GPU counterparts. We also provide results and observations obtained from implementing and evaluating our schedulers on the NVIDIA Jetson TX1 system-on-chip architecture. We observe that although TX1 has the newer Maxwell architecture, the mechanism used for scheduler timings behaves differently on TX1 compared to Kepler leading to incorrect timings. In this paper, we describe our methods that allow us to report correct timings for CPU schedulers running on TX1. Lastly, we propose new research directions involving the investigation of additional scheduling strategies.},
doi = {10.1002/cpe.4118},
journal = {Concurrency and Computation. Practice and Experience},
number = 15,
volume = 29,
place = {United States},
year = {Mon Jun 05 00:00:00 EDT 2017},
month = {Mon Jun 05 00:00:00 EDT 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Softshell: dynamic scheduling on GPUs
journal, November 2012

  • Steinberger, Markus; Kainz, Bernhard; Kerbl, Bernhard
  • ACM Transactions on Graphics, Vol. 31, Issue 6
  • DOI: 10.1145/2366145.2366180

The synchronous languages 12 years later
journal, January 2003


The ESTEREL language
journal, January 1991

  • Boussinot, F.; de Simone, R.
  • Proceedings of the IEEE, Vol. 79, Issue 9
  • DOI: 10.1109/5.97299

OptiX: a general purpose ray tracing engine
journal, July 2010

  • Parker, Steven G.; Robison, Austin; Stich, Martin
  • ACM Transactions on Graphics, Vol. 29, Issue 4
  • DOI: 10.1145/1778765.1778803

GRAMPS: A programming model for graphics pipelines
journal, January 2009

  • Sugerman, Jeremy; Fatahalian, Kayvon; Boulos, Solomon
  • ACM Transactions on Graphics, Vol. 28, Issue 1
  • DOI: 10.1145/1477926.1477930

The synchronous data flow programming language LUSTRE
journal, January 1991

  • Halbwachs, N.; Caspi, P.; Raymond, P.
  • Proceedings of the IEEE, Vol. 79, Issue 9
  • DOI: 10.1109/5.97300

Programming real-time applications with SIGNAL
journal, January 1991

  • LeGuernic, P.; Gautier, T.; Le Borgne, M.
  • Proceedings of the IEEE, Vol. 79, Issue 9
  • DOI: 10.1109/5.97301

Out-of-core Data Management for Path Tracing on Hybrid Resources
journal, April 2009


Multitasking Real-time Embedded GPU Computing Tasks
conference, January 2016

  • Muyan-Özçelik, Pιnar; Owens, John D.
  • Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM'16
  • DOI: 10.1145/2883404.2883408

Fragment-Parallel Composite and Filter
journal, June 2010


Cooperative Multitasking for GPU-Accelerated Grid Systems
conference, May 2010

  • Ino, Fumihiko; Ogita, Akihiro; Oita, Kentaro
  • 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
  • DOI: 10.1109/CCGRID.2010.18

Efficiently Using a CUDA-enabled GPU as Shared Resource
conference, June 2010

  • Peters, Hagen; Köper, Martin; Luttenberger, Norbert
  • 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), 2010 10th IEEE International Conference on Computer and Information Technology
  • DOI: 10.1109/CIT.2010.204

Understanding the efficiency of ray traversal on GPUs
conference, January 2009

  • Aila, Timo; Laine, Samuli
  • Proceedings of the 1st ACM conference on High Performance Graphics - HPG '09
  • DOI: 10.1145/1572769.1572792

Message passing on data-parallel architectures
conference, May 2009

  • Stuart, Jeff A.; Owens, John D.
  • Distributed Processing (IPDPS), 2009 IEEE International Symposium on Parallel & Distributed Processing
  • DOI: 10.1109/IPDPS.2009.5161065

Portable and transparent software managed scheduling on accelerators for fair resource sharing
conference, January 2016

  • Margiolas, Christos; O'Boyle, Michael F. P.
  • Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016
  • DOI: 10.1145/2854038.2854040

PTask: operating system abstractions to manage GPUs as compute devices
conference, January 2011

  • Rossbach, Christopher J.; Currey, Jon; Silberstein, Mark
  • Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles - SOSP '11
  • DOI: 10.1145/2043556.2043579

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing
conference, March 2016

  • Wang, Zhenning; Yang, Jun; Melhem, Rami
  • 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
  • DOI: 10.1109/HPCA.2016.7446078

Analyzing CUDA workloads using a detailed GPU simulator
conference, April 2009

  • Bakhoda, Ali; Yuan, George L.; Fung, Wilson W. L.
  • Software (ISPASS), 2009 IEEE International Symposium on Performance Analysis of Systems and Software
  • DOI: 10.1109/ISPASS.2009.4919648