Work stealing for GPU-accelerated parallel programs in a global address space framework

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram; Balaji, Pavan; Sadayappan, P.

doi:10.1002/cpe.3747

Work stealing for GPU-accelerated parallel programs in a global address space framework

Journal Article · Wed Jan 06 04:00:00 EST 2016 · Concurrency and Computation. Practice and Experience

DOI:https://doi.org/10.1002/cpe.3747· OSTI ID:1393474

Arafat, Humayun ^[1]; ^[2]; Krishnamoorthy, Sriram ^[3]; Balaji, Pavan ^[2]; Sadayappan, P. ^[1]

Department of Computer Science and Engineering, The Ohio State University, Columbus OH USA
Mathematics and Computer Science Division, Argonne National Laboratory, Lemont IL USA
Computer Science and Mathematics Division, Pacific Northwest National Laboratory, Richland WA USA

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a function of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain

Research Organization:: Argonne National Laboratory (ANL)

Sponsoring Organization:: USDOE Office of Science

DOE Contract Number:: AC02-06CH11357

OSTI ID:: 1393474

Journal Information:: Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 13 Vol. 28; ISSN 1532-0626

Publisher:: Wiley

Country of Publication:: United States

Language:: English

References (13)

Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit Nieplocha, Jarek; Palmer, Bruce; Tipparaju, Vinod The International Journal of High Performance Computing Applications, Vol. 20, Issue 2 https://doi.org/10.1177/1094342006064503	journal	May 2006
ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis Oehmen, C.; Nieplocha, J. IEEE Transactions on Parallel and Distributed Systems, Vol. 17, Issue 8 https://doi.org/10.1109/TPDS.2006.112	journal	August 2006
Co-array Fortran for parallel programming Numrich, Robert W.; Reid, John ACM SIGPLAN Fortran Forum, Vol. 17, Issue 2 https://doi.org/10.1145/289918.289920	journal	August 1998
Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation Borkar, S. IEEE Micro, Vol. 25, Issue 6 https://doi.org/10.1109/MM.2005.110	journal	November 2005
High performance computational chemistry: An overview of NWChem a distributed parallel application Kendall, Ricky A.; Aprà, Edoardo; Bernholdt, David E. Computer Physics Communications, Vol. 128, Issue 1-2 https://doi.org/10.1016/S0010-4655(00)00065-5	journal	June 2000
Scalable Load Balancing Techniques for Parallel Computers Kumar, V.; Grama, A. Y.; Vempaty, N. R. Journal of Parallel and Distributed Computing, Vol. 22, Issue 1 https://doi.org/10.1006/jpdc.1994.1070	journal	July 1994
Coupled-cluster theory in quantum chemistry Bartlett, Rodney J.; Musiał, Monika Reviews of Modern Physics, Vol. 79, Issue 1 https://doi.org/10.1103/RevModPhys.79.291	journal	February 2007
Towards dense linear algebra for hybrid GPU accelerated manycore systems Tomov, Stanimire; Dongarra, Jack; Baboulin, Marc Parallel Computing, Vol. 36, Issue 5-6 https://doi.org/10.1016/j.parco.2009.12.005	journal	June 2010
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models Baumgartner, G.; Auer, A.; Bernholdt, D. E. Proceedings of the IEEE, Vol. 93, Issue 2 https://doi.org/10.1109/JPROC.2004.840311	journal	February 2005
Lifeline-based global load balancing Saraswat, Vijay A.; Kambadur, Prabhanjan; Kodali, Sreedhar ACM SIGPLAN Notices, Vol. 46, Issue 8 https://doi.org/10.1145/2038037.1941582	journal	February 2011
Titanium: a high-performance Java dialect Yelick, Kathy; Semenzato, Luigi; Pike, Geoff Concurrency: Practice and Experience, Vol. 10, Issue 11-13 https://doi.org/10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H	journal	September 1998
Parallel Programmability and the Chapel Language Chamberlain, B. L.; Callahan, D.; Zima, H. P. The International Journal of High Performance Computing Applications, Vol. 21, Issue 3 https://doi.org/10.1177/1094342007078442	journal	August 2007
Bounds on Multiprocessing Timing Anomalies Graham, R. L. SIAM Journal on Applied Mathematics, Vol. 17, Issue 2 https://doi.org/10.1137/0117039	journal	March 1969

Similar Records

Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

Journal Article · Tue Jan 05 23:00:00 EST 2016 · Concurrency and Computation. Practice and Experience · OSTI ID:1333989

Scalable Work Stealing

Conference · Fri Nov 13 23:00:00 EST 2009 · OSTI ID:986715

Data-driven Fault Tolerance for Work Stealing Computations

Conference · Mon Jun 25 00:00:00 EDT 2012 · OSTI ID:1239507

Related Subjects

97 MATHEMATICS AND COMPUTING
GPU
partitioned global address space
task parallelism

Work stealing for GPU-accelerated parallel programs in a global address space framework

Citation Formats

References (13)

Similar Records

Related Subjects