CosmoFlow: Using Deep Learning to Learn the Universe at Scale
Journal Article
·
· International Conference for High Performance Computing, Networking, Storage and Analysis
- Intel Corp., Hillsboro, OR (United States)
- Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
- Cray Inc., Seattle, WA (United States)
- Univ. of California, Berkeley, CA (United States)
- Intel Corp., Santa Clara, CA (United States)
- Flatiron Inst., New York, NY (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Carnegie Mellon Univ., Pittsburgh, PA (United States)
Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel® Xeon Phi™ processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. Here, these enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters ΩM, σ8 and ns with unprecedented accuracy.
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- Grant/Contract Number:
- AC02-05CH11231
- OSTI ID:
- 1510756
- Journal Information:
- International Conference for High Performance Computing, Networking, Storage and Analysis, Journal Name: International Conference for High Performance Computing, Networking, Storage and Analysis Vol. 2018; ISSN 2167-4329
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Cataloging the visible universe through Bayesian inference in Julia at petascale
Journal Article
·
Thu Aug 24 20:00:00 EDT 2017
· Concurrency and Computation. Practice and Experience
·
OSTI ID:1459400
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Journal Article
·
Tue Dec 29 19:00:00 EST 2020
· IEEE Transactions on Parallel and Distributed Systems
·
OSTI ID:1959404
Cataloging the visible universe through Bayesian inference in Julia at petascale
Journal Article
·
Wed May 01 00:00:00 EDT 2019
· Journal of Parallel and Distributed Computing
·
OSTI ID:1527362