Efficient I/O for Neural Network Training with Compressed Data
FanStore is a shared object store that enables efficient and scalable neural network training on supercomputers. By providing a global cache layer on node-local burst buffers using a compressed representation, it significantly enhances the processing capability of deep learning (DL) applications on existing hardware. In addition, FanStore allows POSIX-compliant file access to the compressed data in user space. We investigate the tradeoff between runtime overhead and data compression ratio using real-world datasets and applications, and propose a compressor selection algorithm to maximize storage capacity given performance constraints. We consider both asynchronous (i.e., with prefetching) and synchronous I/O strategies, and propose mechanisms for selecting compressors for both approaches. Using FanStore, the same storage hardware can host 2-13x more data for example applications without significant runtime overhead. Empirically, our experiments show that FanStore scales to 512 compute nodes with near linear performance scalability.
- Research Organization:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science - Office of Advanced Scientific Computing Research (ASCR); National Science Foundation (NSF)
- DOE Contract Number:
- AC02-06CH11357
- OSTI ID:
- 1804065
- Resource Relation:
- Conference: 34th IEEE International Parallel and Distributed Processing Symposium, 05/18/20 - 05/22/20, New Orleans, LA, US
- Country of Publication:
- United States
- Language:
- English
Similar Records
Proactive Data Containers for Scientific Storage (Final Report)
Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)