skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Deep Learning via I/O Analysis and Optimization

Journal Article · · ACM Transactions on Parallel Computing
DOI:https://doi.org/10.1145/3331526· OSTI ID:1569281
 [1];  [2];  [1];  [2]
  1. Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)

Scalable deep neural network training has been gaining prominence because of the increasing importance of deep learning in a multitude of scientific and commercial domains. Consequently, a number of researchers have investigated techniques to optimize deep learning systems. Much of the prior work has focused on runtime and algorithmic enhancements to optimize the computation and communication. Despite these enhancements, however, deep learning systems still suffer from scalability limitations, particularly with respect to data I/O. This situation is especially true for training models where the computation can be effectively parallelized, leaving I/O as the major bottleneck. In fact, our analysis shows that I/O can take up to 90% of the total training time. Thus, in this article, we first analyze LMDB, the most widely used I/O subsystem of deep learning frameworks, to understand the causes of this I/O inefficiency. Based on our analysis, we propose LMDBIO—an optimized I/O plugin for scalable deep learning. LMDBIO includes six novel optimizations that together address the various shortcomings in existing I/O for deep learning. Lastly, our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on a 9,216-core system.

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
National Science Foundation (NSF); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC02-06CH11357
OSTI ID:
1569281
Journal Information:
ACM Transactions on Parallel Computing, Vol. 6, Issue 2; ISSN 2329-4949
Publisher:
Association for Computing MachineryCopyright Statement
Country of Publication:
United States
Language:
English

References (21)

Towards Scalable Deep Learning via I/O Analysis and Optimization
  • Pumma, Sarunya; Si, Min; Feng, Wu-chun
  • 2017 IEEE 19th International Conference on High Performance Computing and Communications, IEEE 15th International Conference on Smart City and IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.29
conference December 2017
NetCDF: an interface for scientific data access journal July 1990
Parallel I/O Optimizations for Scalable Deep Learning conference December 2017
A Case for Using MPI's Derived Datatypes to Improve I/O Performance conference January 1998
FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters conference June 2016
ImageNet Training in Minutes conference January 2018
Wide Residual Networks conference January 2016
TýrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication conference May 2018
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems
  • Zhu, Yue; Chowdhury, Fahim; Fu, Huansong
  • 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) https://doi.org/10.1109/MASCOTS.2018.00023
conference September 2018
ImageNet Large Scale Visual Recognition Challenge journal April 2015
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
  • Awan, Ammar Ahmad; Hamidouche, Khaled; Hashmi, Jahanzeb Maqbool
  • PPoPP '17: 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3018743.3018769
conference January 2017
Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes journal February 2018
Characterizing Deep-Learning I/O Workloads in TensorFlow
  • Chien, Steven W. D.; Markidis, Stefano; Sishtla, Chaitanya Prasad
  • 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS) https://doi.org/10.1109/PDSW-DISCS.2018.00011
conference November 2018
ImageNet: A large-scale hierarchical image database
  • Deng, Jia; Dong, Wei; Socher, Richard
  • 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848
conference June 2009
Deep Residual Learning for Image Recognition conference June 2016
Process-in-process: techniques for practical address-space sharing
  • Hori, Atsushi; Si, Min; Gerofi, Balazs
  • HPDC '18: The 27th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3208040.3208045
conference June 2018
Exascale Deep Learning for Climate Analytics conference November 2018
Parallel netCDF: A High-Performance Scientific I/O Interface conference January 2003
CosmoFlow: Using Deep Learning to Learn the Universe at Scale
  • Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00068
conference November 2018
ImageNet Large Scale Visual Recognition Challenge text January 2015
Wide Residual Networks preprint January 2016

Figures / Tables (28)