skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

This content will become publicly available on March 14, 2020

Title: CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Abstract

Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel ® Xeon Phi™ processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. Here, these enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters Ω M, σ 8 and n s with unprecedented accuracy.

Authors:
 [1];  [2];  [3];  [1];  [4];  [5];  [6];  [1];  [3];  [5];  [3];  [5];  [5];  [6];  [3];  [2];  [5]
  1. Intel Corp., Hillsboro, OR (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
  3. Cray Inc., Seattle, WA (United States)
  4. Univ. of California, Berkeley, CA (United States)
  5. Intel Corp., Santa Clara, CA (United States)
  6. Flatiron Inst., New York, NY (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Carnegie Mellon Univ., Pittsburgh, PA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1510756
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
International Conference for High Performance Computing, Networking, Storage and Analysis
Additional Journal Information:
Journal Volume: 2018; Conference: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX (United States), 11-16 Nov 2018; Journal ID: ISSN 2167-4329
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
98 NUCLEAR DISARMAMENT, SAFEGUARDS, AND PHYSICAL PROTECTION; Cosmology; Deep Learning; Machine Learning; TensorFlow; High Performance Computing

Citation Formats

Mathuriya, Amrita, Bard, Deborah, Mendygral, Peter, Meadows, Lawrence, Arnemann, James, Shao, Lei, He, Siyu, Karna, Tuomas, Moise, Diana, Pennycook, Simon J., Maschhoff, Kristyn, Sewall, Jason, Kumar, Nalini, Ho, Shirley, Ringenburg, Michael F., Prabhat, Prabhat, and Lee, Victor. CosmoFlow: Using Deep Learning to Learn the Universe at Scale. United States: N. p., 2019. Web. doi:10.1109/sc.2018.00068.
Mathuriya, Amrita, Bard, Deborah, Mendygral, Peter, Meadows, Lawrence, Arnemann, James, Shao, Lei, He, Siyu, Karna, Tuomas, Moise, Diana, Pennycook, Simon J., Maschhoff, Kristyn, Sewall, Jason, Kumar, Nalini, Ho, Shirley, Ringenburg, Michael F., Prabhat, Prabhat, & Lee, Victor. CosmoFlow: Using Deep Learning to Learn the Universe at Scale. United States. doi:10.1109/sc.2018.00068.
Mathuriya, Amrita, Bard, Deborah, Mendygral, Peter, Meadows, Lawrence, Arnemann, James, Shao, Lei, He, Siyu, Karna, Tuomas, Moise, Diana, Pennycook, Simon J., Maschhoff, Kristyn, Sewall, Jason, Kumar, Nalini, Ho, Shirley, Ringenburg, Michael F., Prabhat, Prabhat, and Lee, Victor. Thu . "CosmoFlow: Using Deep Learning to Learn the Universe at Scale". United States. doi:10.1109/sc.2018.00068.
@article{osti_1510756,
title = {CosmoFlow: Using Deep Learning to Learn the Universe at Scale},
author = {Mathuriya, Amrita and Bard, Deborah and Mendygral, Peter and Meadows, Lawrence and Arnemann, James and Shao, Lei and He, Siyu and Karna, Tuomas and Moise, Diana and Pennycook, Simon J. and Maschhoff, Kristyn and Sewall, Jason and Kumar, Nalini and Ho, Shirley and Ringenburg, Michael F. and Prabhat, Prabhat and Lee, Victor},
abstractNote = {Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of the TensorFlow framework. CosmoFlow uses efficient implementations of 3D convolution and pooling primitives, together with improvements in threading for many element-wise operations, to improve training performance on Intel® Xeon Phi™ processors. We also utilize the Cray PE Machine Learning Plugin for efficient scaling to multiple nodes. We demonstrate fully synchronous data-parallel training on 8192 nodes of Cori with 77% parallel efficiency, achieving 3.5 Pflop/s sustained performance. To our knowledge, this is the first large-scale science application of the TensorFlow framework at supercomputer scale with fully-synchronous training. Here, these enhancements enable us to process large 3D dark matter distribution and predict the cosmological parameters ΩM, σ8 and ns with unprecedented accuracy.},
doi = {10.1109/sc.2018.00068},
journal = {International Conference for High Performance Computing, Networking, Storage and Analysis},
number = ,
volume = 2018,
place = {United States},
year = {2019},
month = {3}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on March 14, 2020
Publisher's Version of Record

Save / Share: