Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

TAZeR: Hiding the Cost of Remote I/O in Distributed Scientific Workflows

Conference ·

A perennial bottleneck in distributed workflow analytics is long access latencies for remote data. We ask the question: assuming that data must be accessed remotely, can latencies be hidden? We present TAZeR, a framework that reduces data access latency while increasing data reuse. TAZeR transparently converts POSIX I/O into operations that interleave application work with data transfer, i.e. read, prefetching, and write stage-out. TAZeR ensures read data moves directly to application memory without synchronous intervention (soft zero-copy). TAZeR uses distributed bandwidth-aware staging to exploit reuse across application tasks and to manage the capacity constraints of fast hierarchical storage. We evaluate TAZeR on a High Energy Physics workflow that requests remote data at 48 Gb/s (over two 1 Gb/s WAN links) using complex access patterns. TAZeR is 12× and 22× faster than XRootD (state-of-the-art) and file copies (current approach), respectively; and within 7% of optimal. We discuss conditions when TAZeR can hide I/O accesses; and evaluate performance as effective staging sizes change.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1606362
Report Number(s):
PNNL-SA-148879
Country of Publication:
United States
Language:
English

Similar Records

Transparent Asynchronous Zero-copy Remote I/O (TAZeR)
Software · Wed Nov 13 19:00:00 EST 2019 · OSTI ID:code-32519

Accessing Data Federations with CVMFS
Journal Article · Wed Nov 22 23:00:00 EST 2017 · Journal of Physics. Conference Series · OSTI ID:1399106

Latency hiding for caches
Patent · Tue Jul 26 00:00:00 EDT 2022 · OSTI ID:1924927

Related Subjects