skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Towards Scalable Parallel Training of Deep Neural Networks

Authors:
; ; ;
Publication Date:
Research Org.:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1410093
Report Number(s):
LLNL-CONF-737759
DOE Contract Number:
AC52-07NA27344
Resource Type:
Conference
Resource Relation:
Conference: Presented at: Machine Learning in HPC Workshop (MLHPC) , Supercomputing (SC 17), Denver, CO, United States, Nov 12 - Nov 17, 2017
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE

Citation Formats

Jacobs, S A, Dryden, N, Pearce, R, and Van Essen, B. Towards Scalable Parallel Training of Deep Neural Networks. United States: N. p., 2017. Web. doi:10.1145/3146347.3146353.
Jacobs, S A, Dryden, N, Pearce, R, & Van Essen, B. Towards Scalable Parallel Training of Deep Neural Networks. United States. doi:10.1145/3146347.3146353.
Jacobs, S A, Dryden, N, Pearce, R, and Van Essen, B. 2017. "Towards Scalable Parallel Training of Deep Neural Networks". United States. doi:10.1145/3146347.3146353. https://www.osti.gov/servlets/purl/1410093.
@article{osti_1410093,
title = {Towards Scalable Parallel Training of Deep Neural Networks},
author = {Jacobs, S A and Dryden, N and Pearce, R and Van Essen, B},
abstractNote = {},
doi = {10.1145/3146347.3146353},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2017,
month = 8
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • This study presents a principled empirical evaluation of image storage systems for training deep neural networks. We employ the Caffe deep learning framework to train neural network models for three different data sets, MNIST, CIFAR-10, and ImageNet. While training the models, we evaluate five different options to retrieve training image data: (1) PNG-formatted image files on local file system; (2) pushing pixel arrays from image files into a single HDF5 file on local file system; (3) in-memory arrays to hold the pixel arrays in Python and C++; (4) loading the training data into LevelDB, a log-structured merge tree based key-valuemore » storage; and (5) loading the training data into LMDB, a B+tree based key-value storage. The experimental results quantitatively highlight the disadvantage of using normal image files on local file systems to train deep neural networks and demonstrate reliable performance with key-value storage based storage systems. When training a model on the ImageNet dataset, the image file option was more than 17 times slower than the key-value storage option. Along with measurements on training time, this study provides in-depth analysis on the cause of performance advantages/disadvantages of each back-end to train deep neural networks. We envision the provided measurements and analysis will shed light on the optimal way to architect systems for training neural networks in a scalable manner.« less
  • Deep learning systems have been growing in prominence as a way to automatically characterize objects, trends, and anomalies. Given the importance of deep learning systems, researchers have been investigating techniques to optimize such systems. An area of particular interest has been using large supercomputing systems to quickly generate effective deep learning networks: a phase often referred to as “training” of the deep learning neural network. As we scale existing deep learning frameworks—such as Caffe—on these large supercomputing systems, we notice that the parallelism can help improve the computation tremendously, leaving data I/O as the major bottleneck limiting the overall systemmore » scalability. In this paper, we first present a detailed analysis of the performance bottlenecks of Caffe on large supercomputing systems. Our analysis shows that the I/O subsystem of Caffe—LMDB—relies on memory-mapped I/O to access its database, which can be highly inefficient on large-scale systems because of its interaction with the process scheduling system and the network-based parallel filesystem. Based on this analysis, we then present LMDBIO, our optimized I/O plugin for Caffe that takes into account the data access pattern of Caffe in order to vastly improve I/O performance. Our experimental results show that LMDBIO can improve the overall execution time of Caffe by nearly 20-fold in some cases.« less
  • Abstract not provided.
  • Recent work has demonstrated the use of the extended Kalman filter (EKF) as an alternative to gradient-descent backpropagation when training multi-layer perceptrons. The EKF approach significantly improves convergence properties but at the cost of greater storage and computational complexity. Feldkamp et al. have described a decoupled version of the EKF which preserves the training advantages of the general EKF but which reduces the storage and computational requirements. This paper reviews the general and decoupled EKF approaches and presents sequentialized versions which provide further computational savings over the batch forms. The usefulness of the sequentialized EKF algorithms is demonstrated on amore » pattern classification problem.« less