TAZeR: Hiding the Cost of Remote I/O in Distributed Scientific Workflows
- BATTELLE (PACIFIC NW LAB)
A perennial bottleneck in distributed workflow analytics is long access latencies for remote data. We ask the question: assuming that data must be accessed remotely, can latencies be hidden? We present TAZeR, a framework that reduces data access latency while increasing data reuse. TAZeR transparently converts POSIX I/O into operations that interleave application work with data transfer, i.e. read, prefetching, and write stage-out. TAZeR ensures read data moves directly to application memory without synchronous intervention (soft zero-copy). TAZeR uses distributed bandwidth-aware staging to exploit reuse across application tasks and to manage the capacity constraints of fast hierarchical storage. We evaluate TAZeR on a High Energy Physics workflow that requests remote data at 48 Gb/s (over two 1 Gb/s WAN links) using complex access patterns. TAZeR is 12× and 22× faster than XRootD (state-of-the-art) and file copies (current approach), respectively; and within 7% of optimal. We discuss conditions when TAZeR can hide I/O accesses; and evaluate performance as effective staging sizes change.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 1606362
- Report Number(s):
- PNNL-SA-148879
- Country of Publication:
- United States
- Language:
- English
Similar Records
Accessing Data Federations with CVMFS
Latency hiding for caches