skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems

Abstract

CORAL, the Collaboration of Oak Ridge, Argonne and Livermore, is fielding two similar IBM systems, Summit and Sierra, with NVIDIA GPUs that will replace the existing Titan and Sequoia systems. Summit and Sierra are currently ranked No. 1 and No. 3, respectively on the Top500 list. We discuss the design and key differences of the systems. Our evaluation of the systems highlights the following. Applications that fit in HBM see the most benefit and may prefer more GPUs; however, for some applications, the CPU-GPU bandwidth is more important than the number of GPUs. The node-local burst buffer scales linearly, and can achieve a 4X improvement over the parallel file system for large jobs; smaller jobs, however, may benefit from writing directly to the PFS. Finally, several CPU, network and memory bound analytics and GPU-bound deep learning codes achieve up to a 11X and 79X speedup/node, respectively over Titan.

Authors:
ORCiD logo [1];  [2]; ORCiD logo [1]; ORCiD logo [1];  [3]; ORCiD logo [1];  [3];  [3]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1];  [2];  [2]; ORCiD logo [1];  [2];  [3];  [3];  [2] more »;  [3];  [3]; ORCiD logo [1];  [3];  [2];  [3];  [3];  [2];  [2]; ORCiD logo [1];  [3];  [2];  [3];  [2];  [3]; ORCiD logo [1];  [3];  [1]; ORCiD logo [1]; ORCiD logo [1];  [2];  [3];  [2] « less
  1. ORNL
  2. Lawrence Livermore National Laboratory (LLNL)
  3. IBM
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1564142
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18) - Dallas, Texas, United States of America - 11/11/2018 4:00:00 PM-11/16/2018 10:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Vazhkudai, Sudharshan, de Supinski, Bronis R., Bland, Buddy, Geist II, Al, Grinberg, Leopold, Yin, Junqi, Sexton, James, Kahle, Jim, Zimmer, Christopher, Atchley, Scott, Oral, H, Maxwell, Don, Melesse Vergara, Veronica, Bertsch, Adam, Goldstone, Robin, Joubert, Wayne, Chambreau, Chris, Applehans, David, Blackmore, Robert, Casses, Ben, Chochia, George, Davison, Gene, Ezell, Matthew, Gooding, Tom, Gonsiorowski, Elsa, Hanson, Bill, Hartner, Bill, Karlin, Ian, Leininger, Matthew, Leverman, Dustin B., Marroquin, Chris, Moody, Adam, Ohmacht, Martin, Pankajakshan, Ramesh, Pizzano, Fernando, Rogers II, Jim, Rosenburg, Bryan, Schmidt, Drew, Shankar, Mallikarjun, Wang, Feiyi, Watson, Py, Walkup, Bob, and Weems, Lance. The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems. United States: N. p., 2018. Web. doi:10.1109/SC.2018.00055.
Vazhkudai, Sudharshan, de Supinski, Bronis R., Bland, Buddy, Geist II, Al, Grinberg, Leopold, Yin, Junqi, Sexton, James, Kahle, Jim, Zimmer, Christopher, Atchley, Scott, Oral, H, Maxwell, Don, Melesse Vergara, Veronica, Bertsch, Adam, Goldstone, Robin, Joubert, Wayne, Chambreau, Chris, Applehans, David, Blackmore, Robert, Casses, Ben, Chochia, George, Davison, Gene, Ezell, Matthew, Gooding, Tom, Gonsiorowski, Elsa, Hanson, Bill, Hartner, Bill, Karlin, Ian, Leininger, Matthew, Leverman, Dustin B., Marroquin, Chris, Moody, Adam, Ohmacht, Martin, Pankajakshan, Ramesh, Pizzano, Fernando, Rogers II, Jim, Rosenburg, Bryan, Schmidt, Drew, Shankar, Mallikarjun, Wang, Feiyi, Watson, Py, Walkup, Bob, & Weems, Lance. The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems. United States. doi:10.1109/SC.2018.00055.
Vazhkudai, Sudharshan, de Supinski, Bronis R., Bland, Buddy, Geist II, Al, Grinberg, Leopold, Yin, Junqi, Sexton, James, Kahle, Jim, Zimmer, Christopher, Atchley, Scott, Oral, H, Maxwell, Don, Melesse Vergara, Veronica, Bertsch, Adam, Goldstone, Robin, Joubert, Wayne, Chambreau, Chris, Applehans, David, Blackmore, Robert, Casses, Ben, Chochia, George, Davison, Gene, Ezell, Matthew, Gooding, Tom, Gonsiorowski, Elsa, Hanson, Bill, Hartner, Bill, Karlin, Ian, Leininger, Matthew, Leverman, Dustin B., Marroquin, Chris, Moody, Adam, Ohmacht, Martin, Pankajakshan, Ramesh, Pizzano, Fernando, Rogers II, Jim, Rosenburg, Bryan, Schmidt, Drew, Shankar, Mallikarjun, Wang, Feiyi, Watson, Py, Walkup, Bob, and Weems, Lance. Thu . "The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems". United States. doi:10.1109/SC.2018.00055. https://www.osti.gov/servlets/purl/1564142.
@article{osti_1564142,
title = {The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems},
author = {Vazhkudai, Sudharshan and de Supinski, Bronis R. and Bland, Buddy and Geist II, Al and Grinberg, Leopold and Yin, Junqi and Sexton, James and Kahle, Jim and Zimmer, Christopher and Atchley, Scott and Oral, H and Maxwell, Don and Melesse Vergara, Veronica and Bertsch, Adam and Goldstone, Robin and Joubert, Wayne and Chambreau, Chris and Applehans, David and Blackmore, Robert and Casses, Ben and Chochia, George and Davison, Gene and Ezell, Matthew and Gooding, Tom and Gonsiorowski, Elsa and Hanson, Bill and Hartner, Bill and Karlin, Ian and Leininger, Matthew and Leverman, Dustin B. and Marroquin, Chris and Moody, Adam and Ohmacht, Martin and Pankajakshan, Ramesh and Pizzano, Fernando and Rogers II, Jim and Rosenburg, Bryan and Schmidt, Drew and Shankar, Mallikarjun and Wang, Feiyi and Watson, Py and Walkup, Bob and Weems, Lance},
abstractNote = {CORAL, the Collaboration of Oak Ridge, Argonne and Livermore, is fielding two similar IBM systems, Summit and Sierra, with NVIDIA GPUs that will replace the existing Titan and Sequoia systems. Summit and Sierra are currently ranked No. 1 and No. 3, respectively on the Top500 list. We discuss the design and key differences of the systems. Our evaluation of the systems highlights the following. Applications that fit in HBM see the most benefit and may prefer more GPUs; however, for some applications, the CPU-GPU bandwidth is more important than the number of GPUs. The node-local burst buffer scales linearly, and can achieve a 4X improvement over the parallel file system for large jobs; smaller jobs, however, may benefit from writing directly to the PFS. Finally, several CPU, network and memory bound analytics and GPU-bound deep learning codes achieve up to a 11X and 79X speedup/node, respectively over Titan.},
doi = {10.1109/SC.2018.00055},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {11}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: