skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality.

Abstract

This paper will discuss the entire process of acquiring and deploying Hopper from the first vendor market surveys to providing 3.8 million hours of production cycles per day for NERSC users. Installing the latest system at NERSC has been both a logistical and technical adventure. Balancing compute requirements with power, cooling, and space limitations drove the initial choice and configuration of the XE6, and a number of first-of- a-kind features implemented in collaboration with Cray have resulted in a high performance, usable, and reliable system.

Authors:
 [1];  [1];  [1]
  1. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). NERSC Div.
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1398498
DOE Contract Number:
AC02-05CH11231
Resource Type:
Conference
Resource Relation:
Conference: Cray User Group Meeting, Fairbanks, AK (United States), 23-26 May 2011
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Antypas, Katie, Butler, Tina, and Carter, Jonathan. The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality.. United States: N. p., 2017. Web.
Antypas, Katie, Butler, Tina, & Carter, Jonathan. The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality.. United States.
Antypas, Katie, Butler, Tina, and Carter, Jonathan. Wed . "The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality.". United States. doi:. https://www.osti.gov/servlets/purl/1398498.
@article{osti_1398498,
title = {The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality.},
author = {Antypas, Katie and Butler, Tina and Carter, Jonathan},
abstractNote = {This paper will discuss the entire process of acquiring and deploying Hopper from the first vendor market surveys to providing 3.8 million hours of production cycles per day for NERSC users. Installing the latest system at NERSC has been both a logistical and technical adventure. Balancing compute requirements with power, cooling, and space limitations drove the initial choice and configuration of the XE6, and a number of first-of- a-kind features implemented in collaboration with Cray have resulted in a high performance, usable, and reliable system.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jun 28 00:00:00 EDT 2017},
month = {Wed Jun 28 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • The Spider system at the Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) is the world's largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF's diverse computational environment, the project had a number of ambitious goals. To support the workloads of the OLCF's diverse computational platforms, the aggregate performance and storage capacity of Spider exceed that of our previously deployed systems by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, respectively. Furthermore, Spider supports over 26,000 clients concurrently accessing themore » file system, which exceeds our previously deployed systems by nearly 4x. In addition to these scalability challenges, moving to a center-wide shared file system required dramatically improved resiliency and fault-tolerance mechanisms. This paper details our efforts in designing, deploying, and operating Spider. Through a phased approach of research and development, prototyping, deployment, and transition to operations, this work has resulted in a number of insights into large-scale parallel file system architectures, from both the design and the operational perspectives. We present in this paper our solutions to issues such as network congestion, performance baselining and evaluation, file system journaling overheads, and high availability in a system with tens of thousands of components. We also discuss areas of continued challenges, such as stressed metadata performance and the need for file system quality of service alongside with our efforts to address them. Finally, operational aspects of managing a system of this scale are discussed along with real-world data and observations.« less
  • The United States Department of Energy (DOE) procured new data collection equipment for the 42 vehicles registered to compete in the 1994 Hybrid Electric Vehicle (HEV) Challenge, increasing the amount of information gathered from the worlds largest fleet of HEVs. Data were collected through an on-board data storage device and then analyzed to determine effects of different hybrid control strategies on energy efficiency and driving performance. In this paper, the results of parallel hybrids versus series hybrids with respect to energy usage and acceleration performance are examined, and the efficiency and performance of the power-assist types are compared to thatmore » of the range-extender types. Because on-board and off-board electrical charging performance is critical to an efficient vehicle energy usage cycle, charging performance is presented and changes and improvements from the 1993 HEV Challenge are discussed. Peak power used during acceleration is presented and then compared to the electric motor manufacturer ratings. Improvements in data acquisition methods for the 1995 HEV Challenge are recommended.« less
  • The Leadership Computing Facility (LCF) at Oak Ridge National Laboratory (ORNL) has a diverse portfolio of computational resources ranging from a petascale XT4/XT5 simulation system (Jaguar) to numerous other systems supporting development, visualization, and data analytics. In order to support vastly different I/O needs of these systems Spider, a Lustre-based center wide file system was designed and deployed to provide over 240 GB/s of aggregate throughput with over 10 Petabytes of formatted capacity. A multi-stage InfiniBand network, dubbed as Scalable I/O Network (SION), with over 889 GB/s of bisectional bandwidth was deployed as part of Spider to provide connectivity tomore » our simulation, development, visualization, and other platforms. To our knowledge, while writing this paper, Spider is the largest and fastest POSIX-compliant parallel file system in production. This paper will detail the overall architecture of the Spider system, challenges in deploying and initial testings of a file system of this scale, and novel solutions to these challenges which offer key insights into file system design in the future.« less
  • The application of spatiotemporal (ST) analytics to integrated data from major sources such as the World Bank, United Nations, and dozens of others holds tremendous potential for shedding new light on the evolution of cultural, health, economic, and geopolitical landscapes on a global level. Realizing this potential first requires an ST data model that addresses challenges in properly merging data from multiple authors, with evolving ontological perspectives, semantical differences, and changing attributes, as well as content that is textual, numeric, categorical, and hierarchical. Equally challenging is the development of analytical and visualization approaches that provide a serious exploration of thismore » integrated data while remaining accessible to practitioners with varied backgrounds. The WSTAMP project at Oak Ridge National Laboratory has yielded two major results in addressing these challenges: 1) development of the WSTAMP database, a significant advance in ST data modeling that integrates 10,000+ attributes covering over 200 nation states spanning over 50 years from over 30 major sources and 2) a novel online ST exploratory and analysis tool providing an array of modern statistical and visualization techniques for analyzing these data temporally, spatially, and spatiotemporally under a standard analytic workflow. We discuss the status of this work and report on major findings. Acknowledgment Prepared by Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, Tennessee 37831-6285, managed by UT-Battelle, LLC for the U. S. Department of Energy under contract no. DEAC05-00OR22725. Copyright This manuscript has been authored by employees of UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy. Accordingly, the United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.« less