DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Deep Learning via I/O Analysis and Optimization

Abstract

Scalable deep neural network training has been gaining prominence because of the increasing importance of deep learning in a multitude of scientific and commercial domains. Consequently, a number of researchers have investigated techniques to optimize deep learning systems. Much of the prior work has focused on runtime and algorithmic enhancements to optimize the computation and communication. Despite these enhancements, however, deep learning systems still suffer from scalability limitations, particularly with respect to data I/O. This situation is especially true for training models where the computation can be effectively parallelized, leaving I/O as the major bottleneck. In fact, our analysis shows that I/O can take up to 90% of the total training time. Thus, in this article, we first analyze LMDB, the most widely used I/O subsystem of deep learning frameworks, to understand the causes of this I/O inefficiency. Based on our analysis, we propose LMDBIO—an optimized I/O plugin for scalable deep learning. LMDBIO includes six novel optimizations that together address the various shortcomings in existing I/O for deep learning. Lastly, our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on a 9,216-core system.

Authors:
 [1];  [2];  [1];  [2]
  1. Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
  2. Argonne National Lab. (ANL), Argonne, IL (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
National Science Foundation (NSF); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1569281
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Parallel Computing
Additional Journal Information:
Journal Volume: 6; Journal Issue: 2; Journal ID: ISSN 2329-4949
Publisher:
Association for Computing Machinery
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; caffe; I/O bottleneck; I/O in deep learning; LMDB; LMDBIO; scalable deep learning; parallel I/O

Citation Formats

Pumma, Sarunya, Si, Min, Feng, Wu-Chun, and Balaji, Pavan. Scalable Deep Learning via I/O Analysis and Optimization. United States: N. p., 2019. Web. doi:10.1145/3331526.
Pumma, Sarunya, Si, Min, Feng, Wu-Chun, & Balaji, Pavan. Scalable Deep Learning via I/O Analysis and Optimization. United States. https://doi.org/10.1145/3331526
Pumma, Sarunya, Si, Min, Feng, Wu-Chun, and Balaji, Pavan. Mon . "Scalable Deep Learning via I/O Analysis and Optimization". United States. https://doi.org/10.1145/3331526. https://www.osti.gov/servlets/purl/1569281.
@article{osti_1569281,
title = {Scalable Deep Learning via I/O Analysis and Optimization},
author = {Pumma, Sarunya and Si, Min and Feng, Wu-Chun and Balaji, Pavan},
abstractNote = {Scalable deep neural network training has been gaining prominence because of the increasing importance of deep learning in a multitude of scientific and commercial domains. Consequently, a number of researchers have investigated techniques to optimize deep learning systems. Much of the prior work has focused on runtime and algorithmic enhancements to optimize the computation and communication. Despite these enhancements, however, deep learning systems still suffer from scalability limitations, particularly with respect to data I/O. This situation is especially true for training models where the computation can be effectively parallelized, leaving I/O as the major bottleneck. In fact, our analysis shows that I/O can take up to 90% of the total training time. Thus, in this article, we first analyze LMDB, the most widely used I/O subsystem of deep learning frameworks, to understand the causes of this I/O inefficiency. Based on our analysis, we propose LMDBIO—an optimized I/O plugin for scalable deep learning. LMDBIO includes six novel optimizations that together address the various shortcomings in existing I/O for deep learning. Lastly, our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on a 9,216-core system.},
doi = {10.1145/3331526},
journal = {ACM Transactions on Parallel Computing},
number = 2,
volume = 6,
place = {United States},
year = {Mon Jul 01 00:00:00 EDT 2019},
month = {Mon Jul 01 00:00:00 EDT 2019}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Figures / Tables:

Fig. 1 Fig. 1: Caffe’s data-parallel workflow

Save / Share:

Works referenced in this record:

Towards Scalable Deep Learning via I/O Analysis and Optimization
conference, December 2017

  • Pumma, Sarunya; Si, Min; Feng, Wu-chun
  • 2017 IEEE 19th International Conference on High Performance Computing and Communications, IEEE 15th International Conference on Smart City and IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • DOI: 10.1109/HPCC-SmartCity-DSS.2017.29

NetCDF: an interface for scientific data access
journal, July 1990

  • Rew, R.; Davis, G.
  • IEEE Computer Graphics and Applications, Vol. 10, Issue 4
  • DOI: 10.1109/38.56302

Parallel I/O Optimizations for Scalable Deep Learning
conference, December 2017

  • Pumma, Sarunya; Si, Min; Feng, Wu-chun
  • 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS)
  • DOI: 10.1109/ICPADS.2017.00097

A Case for Using MPI's Derived Datatypes to Improve I/O Performance
conference, January 1998

  • Thakur, R.; Gropp, W.; Lusk, E.
  • SC98 - High Performance Networking and Computing Conference, Proceedings of the IEEE/ACM SC98 Conference
  • DOI: 10.1109/SC.1998.10006

FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters
conference, June 2016

  • Iandola, Forrest N.; Moskewicz, Matthew W.; Ashraf, Khalid
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • DOI: 10.1109/CVPR.2016.284

ImageNet Training in Minutes
conference, January 2018

  • You, Yang; Zhang, Zhao; Hsieh, Cho-Jui
  • Proceedings of the 47th International Conference on Parallel Processing - ICPP 2018
  • DOI: 10.1145/3225058.3225069

Wide Residual Networks
conference, January 2016

  • Zagoruyko, Sergey; Komodakis, Nikos
  • Procedings of the British Machine Vision Conference 2016
  • DOI: 10.5244/C.30.87

TýrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication
conference, May 2018

  • Matri, Pierre; Perez, Maria S.; Costan, Alexandru
  • 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
  • DOI: 10.1109/CCGRID.2018.00072

Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems
conference, September 2018

  • Zhu, Yue; Chowdhury, Fahim; Fu, Huansong
  • 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
  • DOI: 10.1109/MASCOTS.2018.00023

ImageNet Large Scale Visual Recognition Challenge
journal, April 2015

  • Russakovsky, Olga; Deng, Jia; Su, Hao
  • International Journal of Computer Vision, Vol. 115, Issue 3
  • DOI: 10.1007/s11263-015-0816-y

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
conference, January 2017

  • Awan, Ammar Ahmad; Hamidouche, Khaled; Hashmi, Jahanzeb Maqbool
  • PPoPP '17: 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • DOI: 10.1145/3018743.3018769

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes
journal, February 2018

  • Azarkhish, Erfan; Rossi, Davide; Loi, Igor
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 2
  • DOI: 10.1109/TPDS.2017.2752706

Characterizing Deep-Learning I/O Workloads in TensorFlow
conference, November 2018

  • Chien, Steven W. D.; Markidis, Stefano; Sishtla, Chaitanya Prasad
  • 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)
  • DOI: 10.1109/PDSW-DISCS.2018.00011

ImageNet: A large-scale hierarchical image database
conference, June 2009

  • Deng, Jia; Dong, Wei; Socher, Richard
  • 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition
  • DOI: 10.1109/CVPR.2009.5206848

Deep Residual Learning for Image Recognition
conference, June 2016

  • He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • DOI: 10.1109/CVPR.2016.90

Process-in-process: techniques for practical address-space sharing
conference, June 2018

  • Hori, Atsushi; Si, Min; Gerofi, Balazs
  • HPDC '18: The 27th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
  • DOI: 10.1145/3208040.3208045

Exascale Deep Learning for Climate Analytics
conference, November 2018

  • Kurth, Thorsten; Treichler, Sean; Romero, Joshua
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2018.00054

Parallel netCDF: A High-Performance Scientific I/O Interface
conference, January 2003

  • Li, Jianwei; Zingale, Michael; Liao, Wei-keng
  • Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03
  • DOI: 10.1145/1048935.1050189

CosmoFlow: Using Deep Learning to Learn the Universe at Scale
conference, November 2018

  • Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter
  • SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2018.00068

ImageNet Large Scale Visual Recognition Challenge
text, January 2015

  • Jia, Deng,; Andrej, Karpathy,; Sean, Ma,
  • The University of North Carolina at Chapel Hill University Libraries
  • DOI: 10.17615/009h-3a34

Wide Residual Networks
preprint, January 2016