skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Deep Learning via I/O Analysis and Optimization

Abstract

Scalable deep neural network training has been gaining prominence because of the increasing importance of deep learning in a multitude of scientific and commercial domains. Consequently, a number of researchers have investigated techniques to optimize deep learning systems. Much of the prior work has focused on runtime and algorithmic enhancements to optimize the computation and communication. Despite these enhancements, however, deep learning systems still suffer from scalability limitations, particularly with respect to data I/O. This situation is especially true for training models where the computation can be effectively parallelized, leaving I/O as the major bottleneck. In fact, our analysis shows that I/O can take up to 90% of the total training time. Thus, in this article, we first analyze LMDB, the most widely used I/O subsystem of deep learning frameworks, to understand the causes of this I/O inefficiency. Based on our analysis, we propose LMDBIO—an optimized I/O plugin for scalable deep learning. LMDBIO includes six novel optimizations that together address the various shortcomings in existing I/O for deep learning. Our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on a 9,216-core system.

Authors:
; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science - Office of Advanced Scientific Computing Research; National Science Foundation (NSF)
OSTI Identifier:
1569281
DOE Contract Number:  
AC02-06CH11357
Resource Type:
Journal Article
Journal Name:
ACM Transactions on Parallel Computing
Additional Journal Information:
Journal Volume: 6; Journal Issue: 2
Country of Publication:
United States
Language:
English
Subject:
Caffe; I/O bottleneck; I/O in deep learning; LMDB; LMDBIO; Scalable deep learning; parallel I/O

Citation Formats

Pumma, Sarunya, Si, Min, Feng, Wu-Chun, and Balaji, Pavan. Scalable Deep Learning via I/O Analysis and Optimization. United States: N. p., 2019. Web. doi:10.1145/3331526.
Pumma, Sarunya, Si, Min, Feng, Wu-Chun, & Balaji, Pavan. Scalable Deep Learning via I/O Analysis and Optimization. United States. doi:10.1145/3331526.
Pumma, Sarunya, Si, Min, Feng, Wu-Chun, and Balaji, Pavan. Sun . "Scalable Deep Learning via I/O Analysis and Optimization". United States. doi:10.1145/3331526.
@article{osti_1569281,
title = {Scalable Deep Learning via I/O Analysis and Optimization},
author = {Pumma, Sarunya and Si, Min and Feng, Wu-Chun and Balaji, Pavan},
abstractNote = {Scalable deep neural network training has been gaining prominence because of the increasing importance of deep learning in a multitude of scientific and commercial domains. Consequently, a number of researchers have investigated techniques to optimize deep learning systems. Much of the prior work has focused on runtime and algorithmic enhancements to optimize the computation and communication. Despite these enhancements, however, deep learning systems still suffer from scalability limitations, particularly with respect to data I/O. This situation is especially true for training models where the computation can be effectively parallelized, leaving I/O as the major bottleneck. In fact, our analysis shows that I/O can take up to 90% of the total training time. Thus, in this article, we first analyze LMDB, the most widely used I/O subsystem of deep learning frameworks, to understand the causes of this I/O inefficiency. Based on our analysis, we propose LMDBIO—an optimized I/O plugin for scalable deep learning. LMDBIO includes six novel optimizations that together address the various shortcomings in existing I/O for deep learning. Our experimental results show that LMDBIO significantly outperforms LMDB in all cases and improves overall application performance by up to 65-fold on a 9,216-core system.},
doi = {10.1145/3331526},
journal = {ACM Transactions on Parallel Computing},
number = 2,
volume = 6,
place = {United States},
year = {2019},
month = {9}
}