Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Wang, Ke; Qiao, Kan; Sadooghi, Iman; Zhou, Xiaobing; Li, Tonglin; Lang, Michael; Raicu, Ioan

doi:10.1002/cpe.3617

Title: Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Abstract

Summary Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.

Authors:

Wang, Ke ^[1]; Qiao, Kan ^[2]; Sadooghi, Iman ^[1]; Zhou, Xiaobing ^[3]; Li, Tonglin ^[1]; Lang, Michael ^[4]; Raicu, Ioan ^[5]

Department of Computer Science Illinois Institute of Technology 10 W 31st St, Stuart Building, Room 002 Chicago IL 60616 USA
Google Inc. Seattle WA 98103 USA
Hortonworks Inc. Santa Clara CA USA
Los Alamos National Laboratory Los Alamos NM USA
Department of Computer Science Illinois Institute of Technology 10 W 31st St, Stuart Building, Room 002 Chicago IL 60616 USA, Argonne National Laboratory Lemont IL USA

Publication Date:: Fri Aug 14 00:00:00 EDT 2015

Sponsoring Org.:: USDOE

OSTI Identifier:: 1786148

Grant/Contract Number:: FC02-06ER25750

Resource Type:: Publisher's Accepted Manuscript

Journal Name:: Concurrency and Computation. Practice and Experience

Additional Journal Information:: Journal Name: Concurrency and Computation. Practice and Experience Journal Volume: 28 Journal Issue: 1; Journal ID: ISSN 1532-0626

Publisher:: Wiley Blackwell (John Wiley & Sons)

Country of Publication:: United Kingdom

Language:: English

Citation Formats


                    Wang, Ke, Qiao, Kan, Sadooghi, Iman, Zhou, Xiaobing, Li, Tonglin, Lang, Michael, and Raicu, Ioan. Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales.  United Kingdom: N. p., 2015. 
Web.  doi:10.1002/cpe.3617.

Copy to clipboard


                    Wang, Ke, Qiao, Kan, Sadooghi, Iman, Zhou, Xiaobing, Li, Tonglin, Lang, Michael, & Raicu, Ioan. Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales.  United Kingdom.  https://doi.org/10.1002/cpe.3617

Copy to clipboard


                    Wang, Ke, Qiao, Kan, Sadooghi, Iman, Zhou, Xiaobing, Li, Tonglin, Lang, Michael, and Raicu, Ioan. Fri .  
"Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales".  United Kingdom.  https://doi.org/10.1002/cpe.3617.

Copy to clipboard


                    
@article{osti_1786148,

  title        = {Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales},

  author       = {Wang, Ke and Qiao, Kan and Sadooghi, Iman and Zhou, Xiaobing and Li, Tonglin and Lang, Michael and Raicu, Ioan},

  abstractNote = {Summary Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.},

  doi          = {10.1002/cpe.3617},

  journal      = {Concurrency and Computation. Practice and Experience},

  number       = 1,

  volume       = 28,

  place        = {United Kingdom},

  year         = {Fri Aug 14 00:00:00 EDT 2015},

  month        = {Fri Aug 14 00:00:00 EDT 2015}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (Publisher)

Publisher's Version of Record
https://doi.org/10.1002/cpe.3617

Other availability

Search WorldCat to find libraries that may hold this journal

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

Strategies for dynamic load balancing on highly parallel computers
journal, January 1993

Willebeek-LeMair, M. H.; Reeves, A. P.
IEEE Transactions on Parallel and Distributed Systems, Vol. 4, Issue 9
DOI: 10.1109/71.243526

Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing
conference, May 2014

Sadooghi, Iman; Palur, Sandeep; Anthony, Ajay
2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
DOI: 10.1109/CCGrid.2014.30

The Hadoop Distributed File System
conference, May 2010

Shvachko, Konstantin; Kuang, Hairong; Radia, Sanjay
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
DOI: 10.1109/MSST.2010.5496972

The quest for scalable support of data-intensive workloads in distributed systems
conference, January 2009

Raicu, Ioan; Foster, Ian T.; Zhao, Yong
Proceedings of the 18th ACM international symposium on High performance distributed computing - HPDC '09
DOI: 10.1145/1551609.1551642

Dynamo: amazon's highly available key-value store
conference, January 2007

DeCandia, Giuseppe; Hastorun, Deniz; Jampani, Madan
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles - SOSP '07
DOI: 10.1145/1294261.1294281

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
conference, September 2010

Zheng, Gengbin; Meneses, Esteban; Bhatele, Abhinav
2010 International Conference on Parallel Processing Workshops (ICPPW), 2010 39th International Conference on Parallel Processing Workshops
DOI: 10.1109/ICPPW.2010.65

Swift: Fast, Reliable, Loosely Coupled Parallel Computation
conference, July 2007

Zhao, Yong; Hategan, Mihael; Clifford, Ben
2007 IEEE Congress on Services (Services 2007)
DOI: 10.1109/SERVICES.2007.63

Accelerating large-scale data exploration through data diffusion
conference, January 2008

Raicu, Ioan; Zhao, Yong; Foster, Ian T.
Proceedings of the 2008 international workshop on Data-aware distributed computing - DADC '08
DOI: 10.1145/1383519.1383521

Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol
book, January 2012

Besseron, Xavier; Gautier, Thierry
Euro-Par 2011: Parallel Processing Workshops
DOI: 10.1007/978-3-642-29740-3_36

Scalable work stealing
conference, January 2009

Dinan, James; Larkins, D. Brian; Sadayappan, P.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
DOI: 10.1145/1654059.1654113

SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
conference, January 2010

Guo, Yi; Zhao, Jisheng; Cave, Vincent
Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '10
DOI: 10.1145/1693453.1693504

GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System
conference, September 2015

Li, Tonglin; Ma, Chaoqi; Li, Jiabao
2015 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/CLUSTER.2015.90

Next generation job management systems for extreme-scale ensemble computing
conference, January 2014

Wang, Ke; Zhou, Xiaobing; Chen, Hao
Proceedings of the 23rd international symposium on High-performance parallel and distributed computing - HPDC '14
DOI: 10.1145/2600212.2600703

Dynamic circular work-stealing deque
conference, January 2005

Chase, David; Lev, Yossi
Proceedings of the 17th annual ACM symposium on Parallelism in algorithms and architectures - SPAA'05
DOI: 10.1145/1073970.1073974

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
conference, November 2014

Lifflander, Jonathan; Krishnamoorthy, Sriram; Kale, Laxmikant V.
SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2014.75

A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI
conference, January 1990

Furuichi, M.; Taki, K.; Ichiyoshi, N.
Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming - PPOPP '90
DOI: 10.1145/99163.99170

X10: an object-oriented approach to non-uniform cluster computing
conference, January 2005

Charles, Philippe; Grothoff, Christian; Saraswat, Vijay
Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA '05
DOI: 10.1145/1094811.1094852

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
conference, October 2014

Zhao, Dongfang; Zhang, Zhao; Zhou, Xiaobing
2014 IEEE International Conference on Big Data (Big Data)
DOI: 10.1109/BigData.2014.7004214

An approximate analysis of the join the shortest queue (JSQ) policy
journal, March 1996

Lin, Hwa-Chun; Raghavendra, C. S.
IEEE Transactions on Parallel and Distributed Systems, Vol. 7, Issue 3, p. 301-307
DOI: 10.1109/71.491583

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
journal, January 2010

Moretti, Christopher; Bui, Hoang; Hollingsworth, Karen
IEEE Transactions on Parallel and Distributed Systems, Vol. 21, Issue 1
DOI: 10.1109/TPDS.2009.49

ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table
conference, May 2013

Li, Tonglin; Zhou, Xiaobing; Brandstatter, Kevin
2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
DOI: 10.1109/IPDPS.2013.110

The implementation of the Cilk-5 multithreaded language
journal, May 1998

Frigo, Matteo; Leiserson, Charles E.; Randall, Keith H.
ACM SIGPLAN Notices, Vol. 33, Issue 5
DOI: 10.1145/277652.277725

Analysis of size interval task assignment policies
journal, August 2008

Bachmat, Eitan; Sarfati, Hagit
ACM SIGMETRICS Performance Evaluation Review, Vol. 36, Issue 2
DOI: 10.1145/1453175.1453199

Modeling Many-Task Computing Workloads on a Petaflop IBM Blue Gene/P Supercomputer
conference, May 2013

Wang, Ke; Ma, Zhangjie; Raicu, Ioan
2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
DOI: 10.1109/IPDPSW.2013.274

Falkon: a Fast and Light-weight tasK executiON framework
conference, January 2007

Raicu, Ioan; Zhao, Yong; Dumitrescu, Catalin
Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
DOI: 10.1145/1362622.1362680

A distributed dynamic load balancer for iterative applications
conference, November 2013

Menon, Harshitha; Kalé, Laxmikant
SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
DOI: 10.1145/2503210.2503284

Toward loosely coupled programming on petascale systems
conference, November 2008

Raicu, Ioan; Wilde, Mike
2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
DOI: 10.1109/SC.2008.5219768

Scalable Load Balancing Techniques for Parallel Computers
journal, July 1994

Kumar, V.; Grama, A. Y.; Vempaty, N. R.
Journal of Parallel and Distributed Computing, Vol. 22, Issue 1
DOI: 10.1006/jpdc.1994.1070

Overcoming Hadoop Scaling Limitations through Distributed Task Execution
conference, September 2015

Wang, Ke; Liu, Ning; Sadooghi, Iman
2015 IEEE International Conference on Cluster Computing (CLUSTER)
DOI: 10.1109/CLUSTER.2015.42

X10: an object-oriented approach to non-uniform cluster computing
journal, October 2005

Charles, Philippe; Grothoff, Christian; Saraswat, Vijay
ACM SIGPLAN Notices, Vol. 40, Issue 10
DOI: 10.1145/1103845.1094852

NP-complete scheduling problems
journal, June 1975

Ullman, J. D.
Journal of Computer and System Sciences, Vol. 10, Issue 3
DOI: 10.1016/S0022-0000(75)80008-0

On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks
conference, October 2013

Paudel, Jeeva; Tardieu, Olivier; Amaral, Jose Nelson
2013 42nd International Conference on Parallel Processing (ICPP)
DOI: 10.1109/ICPP.2013.19

Epidemic algorithms for replicated database maintenance
conference, January 1987

Demers, Alan; Greene, Dan; Hauser, Carl
Proceedings of the sixth annual ACM Symposium on Principles of distributed computing - PODC '87
DOI: 10.1145/41840.41841

Optimizing load balancing and data-locality with data-aware scheduling
conference, October 2014

Wang, Ke; Zhou, Xraobing; Li, Tonglin
2014 IEEE International Conference on Big Data (Big Data)
DOI: 10.1109/BigData.2014.7004220

Dryad: distributed data-parallel programs from sequential building blocks
conference, January 2007

Isard, Michael; Budiu, Mihai; Yu, Yuan
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 - EuroSys '07
DOI: 10.1145/1272996.1273005

A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks
conference, June 2015

Li, Tonglin; Keahey, Kate; Wang, Ke
HPDC'15: The 24th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 6th Workshop on Scientific Cloud Computing
DOI: 10.1145/2755644.2755650

Using simulation to explore distributed key-value stores for extreme-scale system services
conference, January 2013

Wang, Ke; Kulkarni, Abhishek; Lang, Michael
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13
DOI: 10.1145/2503210.2503239

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing
conference, June 2015

Wang, Ke; Zhou, Xiaobing; Qiao, Kan
HPDC'15: The 24th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
DOI: 10.1145/2749246.2749249

Job placement with unknown duration and no preemption
journal, March 2001

Harchol-Balter, Mor
ACM SIGMETRICS Performance Evaluation Review, Vol. 28, Issue 4
DOI: 10.1145/544397.544399

Distributed computing in practice: the Condor experience
journal, January 2005

Thain, Douglas; Tannenbaum, Todd; Livny, Miron
Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356
DOI: 10.1002/cpe.938

Many-task computing for grids and supercomputers
conference, November 2008

Raicu, Ioan; Foster, Ian T.
2008 Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS)
DOI: 10.1109/MTAGS.2008.4777912

Similar Records in DOE PAGES and OSTI.GOV collections:

Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

Journal Article Sankaran, Ramanan ; Angel, Jordan ; Brown, W. Michael - Concurrency and Computation. Practice and Experience

Summary The growth in size of networked high performance computers along with novel accelerator‐based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub‐optimal, leading to performance loss and variability. Here, we investigate the impact of task placement onmore »« less
Cited by 1
https://doi.org/10.1002/cpe.3457

Full Text Available
Locality-aware and load-balanced static task scheduling for MapReduce

Journal Article Selvitopi, Oguz ; Demirci, Gunduz Vehbi ; Turk, Ata ; ... - Future Generations Computer Systems

Task scheduling for MapReduce jobs has been an active area of research with the objective of decreasing the amount of data transferred during the shuffle phase via exploiting data locality. In the literature, generally only the scheduling of reduce tasks is considered with the assumption that scheduling of map tasks is already determined by the input data placement. However, in cloud or HPC deployments of MapReduce, the input data is located in a remote storage and scheduling map tasks gains importance. Here, we propose models for simultaneous scheduling of map and reduce tasks in order to improve data locality andmore »« less
Cited by 15
https://doi.org/10.1016/j.future.2018.06.035

Full Text Available
A Novel Coarsening Method for Scalable and Efficient Mesh Generation

Technical Report Yoo, A ; Hysom, D ; Gunney, B

In this paper, we propose a novel mesh coarsening method called brick coarsening method. The proposed method can be used in conjunction with any graph partitioners and scales to very large meshes. This method reduces problem space by decomposing the original mesh into fixed-size blocks of nodes called bricks, layered in a similar way to conventional brick laying, and then assigning each node of the original mesh to appropriate brick. Our experiments indicate that the proposed method scales to very large meshes while allowing simple RCB partitioner to produce higher-quality partitions with significantly less edge cuts. Our results further indicatemore »« less
https://doi.org/10.2172/1018444

Full Text Available
Parallel FETI‐DP algorithm for efficient simulation of large‐scale EM problems

Journal Article Zhang, Kedi ; Jin, Jian‐Ming - International Journal of Numerical Modelling. Electronic Networks, Devices and Fields

Summary An efficient parallelization of the dual‐primal finite‐element tearing and interconnecting (FETI‐DP) algorithm is presented for large‐scale electromagnetic simulations. As a nonoverlapping domain decomposition method, the FETI‐DP algorithm formulates a global interface problem, whose iterative solution is accelerated with a solution of a global corner problem. To achieve a good load balance for parallel computation, the original computational domain is decomposed into subdomains with similar sizes and shapes. The subdomains are then distributed to processors based on their close proximity to minimize inter‐processor communication. The parallel generalized minimal residual method, enhanced with the iterative classical Gram‐Schmidt orthogonalization scheme to reducemore »« less
Cited by 7
https://doi.org/10.1002/jnm.2153
AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing

Conference Geng, Tong ; Li, Ang ; Shi, Runbin ; ...

The recent development of deep learning has been mostly focusing on Euclidean data, such as images, videos, audios, etc. However, most real-world information and relation are often expressed as graphs. To efficiently learn from graph data, graph convolutional networks (GCNs) emerge as a promising approach, showing advantages in several practical applications such as social network analysis, knowledge discovery, 3D modeling, motion capturing, etc. Real-world graphs are usually extremely large and imbalanced, posting significant performance demand and design challenges on the hardware dedicated for GCN inference. In this paper, we propose an architecture design called UW-GCN to accelerate graph convolutional networkmore »« less
https://doi.org/10.1109/MICRO50266.2020.00079

Similar Records

Title: Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales

Abstract

Citation Formats

Strategies for dynamic load balancing on highly parallel computers journal, January 1993

Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing conference, May 2014

The Hadoop Distributed File System conference, May 2010

The quest for scalable support of data-intensive workloads in distributed systems conference, January 2009

Dynamo: amazon's highly available key-value store conference, January 2007

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers conference, September 2010

Swift: Fast, Reliable, Loosely Coupled Parallel Computation conference, July 2007

Accelerating large-scale data exploration through data diffusion conference, January 2008

Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol book, January 2012

Scalable work stealing conference, January 2009

SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems conference, January 2010

GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System conference, September 2015

Next generation job management systems for extreme-scale ensemble computing conference, January 2014

Dynamic circular work-stealing deque conference, January 2005

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing conference, November 2014

A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI conference, January 1990

X10: an object-oriented approach to non-uniform cluster computing conference, January 2005

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems conference, October 2014

An approximate analysis of the join the shortest queue (JSQ) policy journal, March 1996

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids journal, January 2010

ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table conference, May 2013

The implementation of the Cilk-5 multithreaded language journal, May 1998

Analysis of size interval task assignment policies journal, August 2008

Modeling Many-Task Computing Workloads on a Petaflop IBM Blue Gene/P Supercomputer conference, May 2013

Falkon: a Fast and Light-weight tasK executiON framework conference, January 2007

A distributed dynamic load balancer for iterative applications conference, November 2013

Toward loosely coupled programming on petascale systems conference, November 2008

Scalable Load Balancing Techniques for Parallel Computers journal, July 1994

Overcoming Hadoop Scaling Limitations through Distributed Task Execution conference, September 2015

X10: an object-oriented approach to non-uniform cluster computing journal, October 2005

NP-complete scheduling problems journal, June 1975

On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks conference, October 2013

Epidemic algorithms for replicated database maintenance conference, January 1987

Optimizing load balancing and data-locality with data-aware scheduling conference, October 2014

Dryad: distributed data-parallel programs from sequential building blocks conference, January 2007

A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks conference, June 2015

Using simulation to explore distributed key-value stores for extreme-scale system services conference, January 2013

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing conference, June 2015

Job placement with unknown duration and no preemption journal, March 2001

Distributed computing in practice: the Condor experience journal, January 2005

Many-task computing for grids and supercomputers conference, November 2008

Strategies for dynamic load balancing on highly parallel computers
journal, January 1993

Achieving Efficient Distributed Scheduling with Message Queues in the Cloud for Many-Task Computing and High-Performance Computing
conference, May 2014

The Hadoop Distributed File System
conference, May 2010

The quest for scalable support of data-intensive workloads in distributed systems
conference, January 2009

Dynamo: amazon's highly available key-value store
conference, January 2007

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
conference, September 2010

Swift: Fast, Reliable, Loosely Coupled Parallel Computation
conference, July 2007

Accelerating large-scale data exploration through data diffusion
conference, January 2008

Impact of Over-Decomposition on Coordinated Checkpoint/Rollback Protocol
book, January 2012

Scalable work stealing
conference, January 2009

SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
conference, January 2010

GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System
conference, September 2015

Next generation job management systems for extreme-scale ensemble computing
conference, January 2014

Dynamic circular work-stealing deque
conference, January 2005

Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing
conference, November 2014

A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI
conference, January 1990

X10: an object-oriented approach to non-uniform cluster computing
conference, January 2005

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
conference, October 2014

An approximate analysis of the join the shortest queue (JSQ) policy
journal, March 1996

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids
journal, January 2010

ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table
conference, May 2013

The implementation of the Cilk-5 multithreaded language
journal, May 1998

Analysis of size interval task assignment policies
journal, August 2008

Modeling Many-Task Computing Workloads on a Petaflop IBM Blue Gene/P Supercomputer
conference, May 2013

Falkon: a Fast and Light-weight tasK executiON framework
conference, January 2007

A distributed dynamic load balancer for iterative applications
conference, November 2013

Toward loosely coupled programming on petascale systems
conference, November 2008

Scalable Load Balancing Techniques for Parallel Computers
journal, July 1994

Overcoming Hadoop Scaling Limitations through Distributed Task Execution
conference, September 2015

X10: an object-oriented approach to non-uniform cluster computing
journal, October 2005

NP-complete scheduling problems
journal, June 1975

On the Merits of Distributed Work-Stealing on Selective Locality-Aware Tasks
conference, October 2013

Epidemic algorithms for replicated database maintenance
conference, January 1987

Optimizing load balancing and data-locality with data-aware scheduling
conference, October 2014

Dryad: distributed data-parallel programs from sequential building blocks
conference, January 2007

A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks
conference, June 2015

Using simulation to explore distributed key-value stores for extreme-scale system services
conference, January 2013

Towards Scalable Distributed Workload Manager with Monitoring-Based Weakly Consistent Resource Stealing
conference, June 2015

Job placement with unknown duration and no preemption
journal, March 2001

Distributed computing in practice: the Condor experience
journal, January 2005

Many-task computing for grids and supercomputers
conference, November 2008