Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales
- Department of Computer Science Illinois Institute of Technology 10 W 31st St, Stuart Building, Room 002 Chicago IL 60616 USA
- Google Inc. Seattle WA 98103 USA
- Hortonworks Inc. Santa Clara CA USA
- Los Alamos National Laboratory Los Alamos NM USA
- Department of Computer Science Illinois Institute of Technology 10 W 31st St, Stuart Building, Room 002 Chicago IL 60616 USA, Argonne National Laboratory Lemont IL USA
Summary
Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.
- Sponsoring Organization:
- USDOE
- Grant/Contract Number:
- FC02-06ER25750
- OSTI ID:
- 1786148
- Journal Information:
- Concurrency and Computation. Practice and Experience, Journal Name: Concurrency and Computation. Practice and Experience Journal Issue: 1 Vol. 28; ISSN 1532-0626
- Publisher:
- Wiley Blackwell (John Wiley & Sons)Copyright Statement
- Country of Publication:
- United Kingdom
- Language:
- English
Distributed computing in practice: the Condor experience
|
journal | January 2005 |
Scalable Load Balancing Techniques for Parallel Computers
|
journal | July 1994 |
Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol
|
book | January 2012 |
NP-complete scheduling problems
|
journal | June 1975 |
Strategies for dynamic load balancing on highly parallel computers
|
journal | January 1993 |
An approximate analysis of the join the shortest queue (JSQ) policy
|
journal | March 1996 |
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
|
conference | October 2014 |
Optimizing load balancing and data-locality with data-aware scheduling
|
conference | October 2014 |
Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing
|
conference | May 2014 |
Overcoming Hadoop Scaling Limitations through Distributed Task Execution
|
conference | September 2015 |
GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System
|
conference | September 2015 |
On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks
|
conference | October 2013 |
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
|
conference | September 2010 |
ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table
|
conference | May 2013 |
Modeling Many-Task Computing Workloads on a Petaflop IBM Blue Gene/P Supercomputer
|
conference | May 2013 |
The Hadoop Distributed File System
|
conference | May 2010 |
Many-task computing for grids and supercomputers
|
conference | November 2008 |
Toward loosely coupled programming on petascale systems
|
conference | November 2008 |
Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
|
conference | November 2014 |
Swift: Fast, Reliable, Loosely Coupled Parallel Computation
|
conference | July 2007 |
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
|
journal | January 2010 |
Dynamic circular work-stealing deque
|
conference | January 2005 |
X10: an object-oriented approach to non-uniform cluster computing
|
conference | January 2005 |
X10: an object-oriented approach to non-uniform cluster computing
|
journal | October 2005 |
Dryad: distributed data-parallel programs from sequential building blocks
|
conference | January 2007 |
Dynamo: amazon's highly available key-value store
|
conference | January 2007 |
Falkon: a Fast and Light-weight tasK executiON framework
|
conference | January 2007 |
Accelerating large-scale data exploration through data diffusion
|
conference | January 2008 |
Analysis of size interval task assignment policies
|
journal | August 2008 |
The quest for scalable support of data-intensive workloads in distributed systems
|
conference | January 2009 |
Scalable work stealing
|
conference | January 2009 |
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
|
conference | January 2010 |
Using simulation to explore distributed key-value stores for extreme-scale system services
|
conference | January 2013 |
A distributed dynamic load balancer for iterative applications
|
conference | November 2013 |
Next generation job management systems for extreme-scale ensemble computing
|
conference | January 2014 |
Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing
|
conference | June 2015 |
A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks
|
conference | June 2015 |
The implementation of the Cilk-5 multithreaded language
|
journal | May 1998 |
Epidemic algorithms for replicated database maintenance
|
conference | January 1987 |
Job placement with unknown duration and no preemption
|
journal | March 2001 |
A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI
|
conference | January 1990 |
Similar Records
Flexible Data-Aware Scheduling for Workflows over an In-Memory Object Store