DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Abstract

Summary Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.

Authors:
 [1];  [2];  [1];  [3];  [1];  [4];  [5]
  1. Department of Computer Science Illinois Institute of Technology 10 W 31st St, Stuart Building, Room 002 Chicago IL 60616 USA
  2. Google Inc. Seattle WA 98103 USA
  3. Hortonworks Inc. Santa Clara CA USA
  4. Los Alamos National Laboratory Los Alamos NM USA
  5. Department of Computer Science Illinois Institute of Technology 10 W 31st St, Stuart Building, Room 002 Chicago IL 60616 USA, Argonne National Laboratory Lemont IL USA
Publication Date:
Sponsoring Org.:
USDOE
OSTI Identifier:
1786148
Grant/Contract Number:  
FC02-06ER25750
Resource Type:
Publisher's Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Name: Concurrency and Computation. Practice and Experience Journal Volume: 28 Journal Issue: 1; Journal ID: ISSN 1532-0626
Publisher:
Wiley Blackwell (John Wiley & Sons)
Country of Publication:
United Kingdom
Language:
English

Citation Formats

Wang, Ke, Qiao, Kan, Sadooghi, Iman, Zhou, Xiaobing, Li, Tonglin, Lang, Michael, and Raicu, Ioan. Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales. United Kingdom: N. p., 2015. Web. doi:10.1002/cpe.3617.
Wang, Ke, Qiao, Kan, Sadooghi, Iman, Zhou, Xiaobing, Li, Tonglin, Lang, Michael, & Raicu, Ioan. Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales. United Kingdom. https://doi.org/10.1002/cpe.3617
Wang, Ke, Qiao, Kan, Sadooghi, Iman, Zhou, Xiaobing, Li, Tonglin, Lang, Michael, and Raicu, Ioan. Fri . "Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales". United Kingdom. https://doi.org/10.1002/cpe.3617.
@article{osti_1786148,
title = {Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales},
author = {Wang, Ke and Qiao, Kan and Sadooghi, Iman and Zhou, Xiaobing and Li, Tonglin and Lang, Michael and Raicu, Ioan},
abstractNote = {Summary Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.},
doi = {10.1002/cpe.3617},
journal = {Concurrency and Computation. Practice and Experience},
number = 1,
volume = 28,
place = {United Kingdom},
year = {Fri Aug 14 00:00:00 EDT 2015},
month = {Fri Aug 14 00:00:00 EDT 2015}
}

Works referenced in this record:

Strategies for dynamic load balancing on highly parallel computers
journal, January 1993

  • Willebeek-LeMair, M. H.; Reeves, A. P.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 4, Issue 9
  • DOI: 10.1109/71.243526

Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing
conference, May 2014

  • Sadooghi, Iman; Palur, Sandeep; Anthony, Ajay
  • 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
  • DOI: 10.1109/CCGrid.2014.30

The Hadoop Distributed File System
conference, May 2010

  • Shvachko, Konstantin; Kuang, Hairong; Radia, Sanjay
  • 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2010.5496972

The quest for scalable support of data-intensive workloads in distributed systems
conference, January 2009

  • Raicu, Ioan; Foster, Ian T.; Zhao, Yong
  • Proceedings of the 18th ACM international symposium on High performance distributed computing - HPDC '09
  • DOI: 10.1145/1551609.1551642

Dynamo: amazon's highly available key-value store
conference, January 2007

  • DeCandia, Giuseppe; Hastorun, Deniz; Jampani, Madan
  • Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles - SOSP '07
  • DOI: 10.1145/1294261.1294281

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
conference, September 2010

  • Zheng, Gengbin; Meneses, Esteban; Bhatele, Abhinav
  • 2010 International Conference on Parallel Processing Workshops (ICPPW), 2010 39th International Conference on Parallel Processing Workshops
  • DOI: 10.1109/ICPPW.2010.65

Swift: Fast, Reliable, Loosely Coupled Parallel Computation
conference, July 2007

  • Zhao, Yong; Hategan, Mihael; Clifford, Ben
  • 2007 IEEE Congress on Services (Services 2007)
  • DOI: 10.1109/SERVICES.2007.63

Accelerating large-scale data exploration through data diffusion
conference, January 2008

  • Raicu, Ioan; Zhao, Yong; Foster, Ian T.
  • Proceedings of the 2008 international workshop on Data-aware distributed computing - DADC '08
  • DOI: 10.1145/1383519.1383521

Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol
book, January 2012


Scalable work stealing
conference, January 2009

  • Dinan, James; Larkins, D. Brian; Sadayappan, P.
  • Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
  • DOI: 10.1145/1654059.1654113

SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
conference, January 2010

  • Guo, Yi; Zhao, Jisheng; Cave, Vincent
  • Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10
  • DOI: 10.1145/1693453.1693504

GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System
conference, September 2015

  • Li, Tonglin; Ma, Chaoqi; Li, Jiabao
  • 2015 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2015.90

Next generation job management systems for extreme-scale ensemble computing
conference, January 2014

  • Wang, Ke; Zhou, Xiaobing; Chen, Hao
  • Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14
  • DOI: 10.1145/2600212.2600703

Dynamic circular work-stealing deque
conference, January 2005

  • Chase, David; Lev, Yossi
  • Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures - SPAA'05
  • DOI: 10.1145/1073970.1073974

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
conference, November 2014

  • Lifflander, Jonathan; Krishnamoorthy, Sriram; Kale, Laxmikant V.
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.75

A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI
conference, January 1990

  • Furuichi, M.; Taki, K.; Ichiyoshi, N.
  • Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming - PPOPP '90
  • DOI: 10.1145/99163.99170

X10: an object-oriented approach to non-uniform cluster computing
conference, January 2005

  • Charles, Philippe; Grothoff, Christian; Saraswat, Vijay
  • Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA '05
  • DOI: 10.1145/1094811.1094852

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
conference, October 2014


An approximate analysis of the join the shortest queue (JSQ) policy
journal, March 1996

  • Lin, Hwa-Chun; Raghavendra, C. S.
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 7, Issue 3, p. 301-307
  • DOI: 10.1109/71.491583

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
journal, January 2010

  • Moretti, Christopher; Bui, Hoang; Hollingsworth, Karen
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 21, Issue 1
  • DOI: 10.1109/TPDS.2009.49

ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table
conference, May 2013

  • Li, Tonglin; Zhou, Xiaobing; Brandstatter, Kevin
  • 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
  • DOI: 10.1109/IPDPS.2013.110

The implementation of the Cilk-5 multithreaded language
journal, May 1998

  • Frigo, Matteo; Leiserson, Charles E.; Randall, Keith H.
  • ACM SIGPLAN Notices, Vol. 33, Issue 5
  • DOI: 10.1145/277652.277725

Analysis of size interval task assignment policies
journal, August 2008

  • Bachmat, Eitan; Sarfati, Hagit
  • ACM SIGMETRICS Performance Evaluation Review, Vol. 36, Issue 2
  • DOI: 10.1145/1453175.1453199

Modeling Many-Task Computing Workloads on a Petaflop IBM Blue Gene/P Supercomputer
conference, May 2013

  • Wang, Ke; Ma, Zhangjie; Raicu, Ioan
  • 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
  • DOI: 10.1109/IPDPSW.2013.274

Falkon: a Fast and Light-weight tasK executiON framework
conference, January 2007

  • Raicu, Ioan; Zhao, Yong; Dumitrescu, Catalin
  • Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
  • DOI: 10.1145/1362622.1362680

A distributed dynamic load balancer for iterative applications
conference, November 2013

  • Menon, Harshitha; Kalé, Laxmikant
  • SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1145/2503210.2503284

Toward loosely coupled programming on petascale systems
conference, November 2008

  • Raicu, Ioan; Wilde, Mike
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2008.5219768

Scalable Load Balancing Techniques for Parallel Computers
journal, July 1994

  • Kumar, V.; Grama, A. Y.; Vempaty, N. R.
  • Journal of Parallel and Distributed Computing, Vol. 22, Issue 1
  • DOI: 10.1006/jpdc.1994.1070

Overcoming Hadoop Scaling Limitations through Distributed Task Execution
conference, September 2015

  • Wang, Ke; Liu, Ning; Sadooghi, Iman
  • 2015 IEEE International Conference on Cluster Computing (CLUSTER)
  • DOI: 10.1109/CLUSTER.2015.42

X10: an object-oriented approach to non-uniform cluster computing
journal, October 2005

  • Charles, Philippe; Grothoff, Christian; Saraswat, Vijay
  • ACM SIGPLAN Notices, Vol. 40, Issue 10
  • DOI: 10.1145/1103845.1094852

NP-complete scheduling problems
journal, June 1975


On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks
conference, October 2013

  • Paudel, Jeeva; Tardieu, Olivier; Amaral, Jose Nelson
  • 2013 42nd International Conference on Parallel Processing (ICPP)
  • DOI: 10.1109/ICPP.2013.19

Epidemic algorithms for replicated database maintenance
conference, January 1987

  • Demers, Alan; Greene, Dan; Hauser, Carl
  • Proceedings of the sixth annual ACM Symposium on Principles of distributed computing - PODC '87
  • DOI: 10.1145/41840.41841

Optimizing load balancing and data-locality with data-aware scheduling
conference, October 2014


Dryad: distributed data-parallel programs from sequential building blocks
conference, January 2007

  • Isard, Michael; Budiu, Mihai; Yu, Yuan
  • Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 - EuroSys '07
  • DOI: 10.1145/1272996.1273005

A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks
conference, June 2015

  • Li, Tonglin; Keahey, Kate; Wang, Ke
  • HPDC'15: The 24th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 6th Workshop on Scientific Cloud Computing
  • DOI: 10.1145/2755644.2755650

Using simulation to explore distributed key-value stores for extreme-scale system services
conference, January 2013

  • Wang, Ke; Kulkarni, Abhishek; Lang, Michael
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
  • DOI: 10.1145/2503210.2503239

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing
conference, June 2015

  • Wang, Ke; Zhou, Xiaobing; Qiao, Kan
  • HPDC'15: The 24th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
  • DOI: 10.1145/2749246.2749249

Job placement with unknown duration and no preemption
journal, March 2001

  • Harchol-Balter, Mor
  • ACM SIGMETRICS Performance Evaluation Review, Vol. 28, Issue 4
  • DOI: 10.1145/544397.544399

Distributed computing in practice: the Condor experience
journal, January 2005

  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356
  • DOI: 10.1002/cpe.938

Many-task computing for grids and supercomputers
conference, November 2008

  • Raicu, Ioan; Foster, Ian T.
  • 2008 Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS)
  • DOI: 10.1109/MTAGS.2008.4777912