Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks
Abstract
Here, we provide an extensive survey on wide spectrum of scheduling methods for multitasking among graphics processing unit (GPU) computing tasks. We then design several schedulers and explain in detail the selected methods we have developed to implement our scheduling strategies. Next, we compare the performance of schedulers on various workloads running on Fermi and Kepler architectures and arrive at the following major conclusions: (1) Small kernels benefit from running kernels concurrently. (2) The combination of small kernels, high-priority kernels with longer runtimes, and lower-priority kernels with shorter runtimes benefits from a CPU scheduler that dynamically changes kernel order on the Fermi architecture. (3) Because of limitations of existing GPU architectures, currently CPU schedulers outperform their GPU counterparts. We also provide results and observations obtained from implementing and evaluating our schedulers on the NVIDIA Jetson TX1 system-on-chip architecture. We observe that although TX1 has the newer Maxwell architecture, the mechanism used for scheduler timings behaves differently on TX1 compared to Kepler leading to incorrect timings. In this paper, we describe our methods that allow us to report correct timings for CPU schedulers running on TX1. Lastly, we propose new research directions involving the investigation of additional scheduling strategies.
- Authors:
-
- California State Univ., Sacramento, CA (United States)
- Univ. of California, Davis, CA (United States)
- Publication Date:
- Research Org.:
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC)
- OSTI Identifier:
- 1528898
- Grant/Contract Number:
- AC02-05CH11231
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Concurrency and Computation. Practice and Experience
- Additional Journal Information:
- Journal Volume: 29; Journal Issue: 15; Journal ID: ISSN 1532-0626
- Publisher:
- Wiley
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; GPU computing; multitasking; real‐time embedded tasks
Citation Formats
Muyan-Özçelik, Pınar, and Owens, John D. Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks. United States: N. p., 2017.
Web. doi:10.1002/cpe.4118.
Muyan-Özçelik, Pınar, & Owens, John D. Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks. United States. https://doi.org/10.1002/cpe.4118
Muyan-Özçelik, Pınar, and Owens, John D. Mon .
"Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks". United States. https://doi.org/10.1002/cpe.4118. https://www.osti.gov/servlets/purl/1528898.
@article{osti_1528898,
title = {Methods for multitasking among real-time embedded compute tasks running on the GPU: Methods for Multitasking Real-time Embedded GPU Computing Tasks},
author = {Muyan-Özçelik, Pınar and Owens, John D.},
abstractNote = {Here, we provide an extensive survey on wide spectrum of scheduling methods for multitasking among graphics processing unit (GPU) computing tasks. We then design several schedulers and explain in detail the selected methods we have developed to implement our scheduling strategies. Next, we compare the performance of schedulers on various workloads running on Fermi and Kepler architectures and arrive at the following major conclusions: (1) Small kernels benefit from running kernels concurrently. (2) The combination of small kernels, high-priority kernels with longer runtimes, and lower-priority kernels with shorter runtimes benefits from a CPU scheduler that dynamically changes kernel order on the Fermi architecture. (3) Because of limitations of existing GPU architectures, currently CPU schedulers outperform their GPU counterparts. We also provide results and observations obtained from implementing and evaluating our schedulers on the NVIDIA Jetson TX1 system-on-chip architecture. We observe that although TX1 has the newer Maxwell architecture, the mechanism used for scheduler timings behaves differently on TX1 compared to Kepler leading to incorrect timings. In this paper, we describe our methods that allow us to report correct timings for CPU schedulers running on TX1. Lastly, we propose new research directions involving the investigation of additional scheduling strategies.},
doi = {10.1002/cpe.4118},
journal = {Concurrency and Computation. Practice and Experience},
number = 15,
volume = 29,
place = {United States},
year = {Mon Jun 05 00:00:00 EDT 2017},
month = {Mon Jun 05 00:00:00 EDT 2017}
}
Web of Science
Works referenced in this record:
Softshell: dynamic scheduling on GPUs
journal, November 2012
- Steinberger, Markus; Kainz, Bernhard; Kerbl, Bernhard
- ACM Transactions on Graphics, Vol. 31, Issue 6
The synchronous languages 12 years later
journal, January 2003
- Benveniste, A.; Caspi, P.; Edwards, S. A.
- Proceedings of the IEEE, Vol. 91, Issue 1
The ESTEREL language
journal, January 1991
- Boussinot, F.; de Simone, R.
- Proceedings of the IEEE, Vol. 79, Issue 9
OptiX: a general purpose ray tracing engine
journal, July 2010
- Parker, Steven G.; Robison, Austin; Stich, Martin
- ACM Transactions on Graphics, Vol. 29, Issue 4
GRAMPS: A programming model for graphics pipelines
journal, January 2009
- Sugerman, Jeremy; Fatahalian, Kayvon; Boulos, Solomon
- ACM Transactions on Graphics, Vol. 28, Issue 1
The synchronous data flow programming language LUSTRE
journal, January 1991
- Halbwachs, N.; Caspi, P.; Raymond, P.
- Proceedings of the IEEE, Vol. 79, Issue 9
Programming real-time applications with SIGNAL
journal, January 1991
- LeGuernic, P.; Gautier, T.; Le Borgne, M.
- Proceedings of the IEEE, Vol. 79, Issue 9
Out-of-core Data Management for Path Tracing on Hybrid Resources
journal, April 2009
- Budge, Brian; Bernardin, Tony; Stuart, Jeff A.
- Computer Graphics Forum, Vol. 28, Issue 2
Multitasking Real-time Embedded GPU Computing Tasks
conference, January 2016
- Muyan-Özçelik, Pιnar; Owens, John D.
- Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM'16
Fragment-Parallel Composite and Filter
journal, June 2010
- Patney, Anjul; Tzeng, Stanley; Owens, John D.
- Computer Graphics Forum, Vol. 29, Issue 4
Cooperative Multitasking for GPU-Accelerated Grid Systems
conference, May 2010
- Ino, Fumihiko; Ogita, Akihiro; Oita, Kentaro
- 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Efficiently Using a CUDA-enabled GPU as Shared Resource
conference, June 2010
- Peters, Hagen; Köper, Martin; Luttenberger, Norbert
- 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), 2010 10th IEEE International Conference on Computer and Information Technology
Understanding the efficiency of ray traversal on GPUs
conference, January 2009
- Aila, Timo; Laine, Samuli
- Proceedings of the 1st ACM conference on High Performance Graphics - HPG '09
Message passing on data-parallel architectures
conference, May 2009
- Stuart, Jeff A.; Owens, John D.
- Distributed Processing (IPDPS), 2009 IEEE International Symposium on Parallel & Distributed Processing
Portable and transparent software managed scheduling on accelerators for fair resource sharing
conference, January 2016
- Margiolas, Christos; O'Boyle, Michael F. P.
- Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016
PTask: operating system abstractions to manage GPUs as compute devices
conference, January 2011
- Rossbach, Christopher J.; Currey, Jon; Silberstein, Mark
- Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles - SOSP '11
Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing
conference, March 2016
- Wang, Zhenning; Yang, Jun; Melhem, Rami
- 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Analyzing CUDA workloads using a detailed GPU simulator
conference, April 2009
- Bakhoda, Ali; Yuan, George L.; Fung, Wilson W. L.
- Software (ISPASS), 2009 IEEE International Symposium on Performance Analysis of Systems and Software