Scalable Deep Learning via I/O Analysis and Optimization

Pumma, Sarunya; Si, Min; Feng, Wu-Chun; Balaji, Pavan

doi:10.1145/3331526

Title: Scalable Deep Learning via I/O Analysis and Optimization

Journal Article · Mon Jul 01 00:00:00 EDT 2019 · ACM Transactions on Parallel Computing

DOI:https://doi.org/10.1145/3331526· OSTI ID:1569281

Pumma, Sarunya ^[1]; Si, Min ^[2]; Feng, Wu-Chun ^[1]; Balaji, Pavan ^[2]

Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)

Scalable deep neural network training has been gaining prominence because of the increasing importance of deep learning in a multitude of scientific and commercial domains. Consequently, a number of researchers have investigated techniques to optimize deep learning systems. Much of the prior work has focused on runtime and algorithmic enhancements to optimize the computation and communication. Despite these enhancements, however, deep learning systems still suffer from scalability limitations, particularly with respect to data I/O. This situation is especially true for training models where the computation can be effectively parallelized, leaving I/O as the major bottleneck. In fact, our analysis shows that I/O can take up to 90% of the total training time. Thus, in this article, we first analyze LMDB, the most widely used I/O subsystem of deep learning frameworks, to understand the causes of this I/O inefficiency. Based on our analysis, we propose LMDBIO—an optimized I/O plugin for scalable deep learning. LMDBIO includes six novel optimizations that together address the various shortcomings in existing I/O for deep learning. Lastly, our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on a 9,216-core system.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Argonne National Lab. (ANL), Argonne, IL (United States)

Sponsoring Organization:: National Science Foundation (NSF); USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)

Grant/Contract Number:: AC02-06CH11357

OSTI ID:: 1569281

Journal Information:: ACM Transactions on Parallel Computing, Vol. 6, Issue 2; ISSN 2329-4949

Publisher:: Association for Computing MachineryCopyright Statement

Country of Publication:: United States

Language:: English

References (21)

Towards Scalable Deep Learning via I/O Analysis and Optimization Pumma, Sarunya; Si, Min; Feng, Wu-chun 2017 IEEE 19th International Conference on High Performance Computing and Communications, IEEE 15th International Conference on Smart City and IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.29	conference	December 2017
NetCDF: an interface for scientific data access Rew, R.; Davis, G. IEEE Computer Graphics and Applications, Vol. 10, Issue 4 https://doi.org/10.1109/38.56302	journal	July 1990
Parallel I/O Optimizations for Scalable Deep Learning Pumma, Sarunya; Si, Min; Feng, Wu-chun 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS) https://doi.org/10.1109/ICPADS.2017.00097	conference	December 2017
A Case for Using MPI's Derived Datatypes to Improve I/O Performance Thakur, R.; Gropp, W.; Lusk, E. SC98 - High Performance Networking and Computing Conference, Proceedings of the IEEE/ACM SC98 Conference https://doi.org/10.1109/SC.1998.10006	conference	January 1998
FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters Iandola, Forrest N.; Moskewicz, Matthew W.; Ashraf, Khalid 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.284	conference	June 2016
ImageNet Training in Minutes You, Yang; Zhang, Zhao; Hsieh, Cho-Jui Proceedings of the 47th International Conference on Parallel Processing - ICPP 2018 https://doi.org/10.1145/3225058.3225069	conference	January 2018
Wide Residual Networks Zagoruyko, Sergey; Komodakis, Nikos Procedings of the British Machine Vision Conference 2016 https://doi.org/10.5244/C.30.87	conference	January 2016
TýrFS: Increasing Small Files Access Performance with Dynamic Metadata Replication Matri, Pierre; Perez, Maria S.; Costan, Alexandru 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) https://doi.org/10.1109/CCGRID.2018.00072	conference	May 2018
Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems Zhu, Yue; Chowdhury, Fahim; Fu, Huansong 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) https://doi.org/10.1109/MASCOTS.2018.00023	conference	September 2018
ImageNet Large Scale Visual Recognition Challenge Russakovsky, Olga; Deng, Jia; Su, Hao International Journal of Computer Vision, Vol. 115, Issue 3 https://doi.org/10.1007/s11263-015-0816-y	journal	April 2015
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters Awan, Ammar Ahmad; Hamidouche, Khaled; Hashmi, Jahanzeb Maqbool PPoPP '17: 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming https://doi.org/10.1145/3018743.3018769	conference	January 2017
Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes Azarkhish, Erfan; Rossi, Davide; Loi, Igor IEEE Transactions on Parallel and Distributed Systems, Vol. 29, Issue 2 https://doi.org/10.1109/TPDS.2017.2752706	journal	February 2018
Characterizing Deep-Learning I/O Workloads in TensorFlow Chien, Steven W. D.; Markidis, Stefano; Sishtla, Chaitanya Prasad 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS) https://doi.org/10.1109/PDSW-DISCS.2018.00011	conference	November 2018
ImageNet: A large-scale hierarchical image database Deng, Jia; Dong, Wei; Socher, Richard 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848	conference	June 2009
Deep Residual Learning for Image Recognition He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR.2016.90	conference	June 2016
Process-in-process: techniques for practical address-space sharing Hori, Atsushi; Si, Min; Gerofi, Balazs HPDC '18: The 27th International Symposium on High-Performance Parallel and Distributed Computing, Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing https://doi.org/10.1145/3208040.3208045	conference	June 2018
Exascale Deep Learning for Climate Analytics Kurth, Thorsten; Treichler, Sean; Romero, Joshua SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00054	conference	November 2018
Parallel netCDF: A High-Performance Scientific I/O Interface Li, Jianwei; Zingale, Michael; Liao, Wei-keng Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03 https://doi.org/10.1145/1048935.1050189	conference	January 2003
CosmoFlow: Using Deep Learning to Learn the Universe at Scale Mathuriya, Amrita; Bard, Deborah; Mendygral, Peter SC18: International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/SC.2018.00068	conference	November 2018
ImageNet Large Scale Visual Recognition Challenge Jia, Deng,; Andrej, Karpathy,; Sean, Ma, The University of North Carolina at Chapel Hill University Libraries https://doi.org/10.17615/009h-3a34	text	January 2015
Wide Residual Networks Zagoruyko, Sergey; Komodakis, Nikos arXiv https://doi.org/10.48550/arxiv.1605.07146	preprint	January 2016