Efficient I/O for Neural Network Training with Compressed Data

Zhang, Zhao; Huang, Lei; Pauloski, J. Gregory; Foster, Ian T.

doi:10.1109/IPDPS47924.2020.00050

Title: Efficient I/O for Neural Network Training with Compressed Data

Conference · Wed Jan 01 00:00:00 EST 2020

DOI:https://doi.org/10.1109/IPDPS47924.2020.00050· OSTI ID:1804065

Zhang, Zhao; Huang, Lei; Pauloski, J. Gregory; Foster, Ian T.

FanStore is a shared object store that enables efficient and scalable neural network training on supercomputers. By providing a global cache layer on node-local burst buffers using a compressed representation, it significantly enhances the processing capability of deep learning (DL) applications on existing hardware. In addition, FanStore allows POSIX-compliant file access to the compressed data in user space. We investigate the tradeoff between runtime overhead and data compression ratio using real-world datasets and applications, and propose a compressor selection algorithm to maximize storage capacity given performance constraints. We consider both asynchronous (i.e., with prefetching) and synchronous I/O strategies, and propose mechanisms for selecting compressors for both approaches. Using FanStore, the same storage hardware can host 2-13x more data for example applications without significant runtime overhead. Empirically, our experiments show that FanStore scales to 512 compute nodes with near linear performance scalability.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Argonne National Lab. (ANL), Argonne, IL (United States)

Sponsoring Organization:: USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)

DOE Contract Number:: AC02-06CH11357

OSTI ID:: 1804065

Resource Relation:: Conference: 34th IEEE International Parallel and Distributed Processing Symposium, 05/18/20 - 05/22/20, New Orleans, LA, US

Country of Publication:: United States

Language:: English

Similar Records

Hvac: Removing I/O Bottleneck for Large-Scale Deep Learning Applications

Conference · Thu Sep 01 00:00:00 EDT 2022 · OSTI ID:1804065

Khan, Awais; Paul, Arnab K.; Zimmer, Christopher; +4 more

Proactive Data Containers for Scientific Storage (Final Report)

Technical Report · Tue Dec 10 00:00:00 EST 2019 · OSTI ID:1804065

Soumagne, Jerome; Warren, Richard; Mu, Jingqing; +9 more

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:1804065

Shen, Xipeng

Title: Efficient I/O for Neural Network Training with Compressed Data

Citation Formats

Similar Records

Related Subjects