Summary: DataDriven Batch Scheduling
A dissertation submitted
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
University of Wisconsin Madison
In this thesis, we present a datadriven batch scheduling system. Current CPUcentric batch schedulers ignore
the data needs within workloads and execute them by linking them transparently and directly to their needed data.
When scheduled on remote computational resources, this elegant solution of direct data access can incur an order
of magnitude performance penalty for dataintensive workloads.
To concretely motivate this problem, we provide here a detailed analysis of six current dataintensive, scientific,