skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An Analysis of HPCC Results on the Cray XT4

Abstract

In the proceedings of CUG2006 we published a paper evaluating the performance of the HPCC benchmark suite on the Cray XT3, jaguar, and newly-upgraded Cray X1E, phoenix supercomputers at the Center for Computation Sciences (CCS) at Oak Ridge National Laboratory (ORNL). Although phoenix has remained unchanged since this paper was published, jaguar has undergone significant changes. It has been upgraded from a single-core XT3 system to a combined dual-core XT3 and XT4 system. This paper revisits HPCC on jaguar and discusses the effect that the past year's system upgrades have had on overall performance.

Authors:
 [1];  [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Center for Computational Sciences
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
931848
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: CUG 2007, Seattle, WA, USA, 20070507, 20070510
Country of Publication:
United States
Language:
English

Citation Formats

Kuehn, Jeffery A, Wichmann, Nathan L, and Larkin, Jeffrey M. An Analysis of HPCC Results on the Cray XT4. United States: N. p., 2007. Web.
Kuehn, Jeffery A, Wichmann, Nathan L, & Larkin, Jeffrey M. An Analysis of HPCC Results on the Cray XT4. United States.
Kuehn, Jeffery A, Wichmann, Nathan L, and Larkin, Jeffrey M. Mon . "An Analysis of HPCC Results on the Cray XT4". United States. doi:.
@article{osti_931848,
title = {An Analysis of HPCC Results on the Cray XT4},
author = {Kuehn, Jeffery A and Wichmann, Nathan L and Larkin, Jeffrey M},
abstractNote = {In the proceedings of CUG2006 we published a paper evaluating the performance of the HPCC benchmark suite on the Cray XT3, jaguar, and newly-upgraded Cray X1E, phoenix supercomputers at the Center for Computation Sciences (CCS) at Oak Ridge National Laboratory (ORNL). Although phoenix has remained unchanged since this paper was published, jaguar has undergone significant changes. It has been upgraded from a single-core XT3 system to a combined dual-core XT3 and XT4 system. This paper revisits HPCC on jaguar and discusses the effect that the past year's system upgrades have had on overall performance.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Scienti c data sets produced by modern supercomputers like ORNL s Cray XT 4, Jaguar, can be extremely large, making visualization and analysis more di cult as moving large resultant data to dedicated analysis systems can be pro- hibitively expensive. We share our continuing work of integrating a parallel visu- alization system, ParaView, on ORNL s Jaguar system and our e orts to enable extreme scale interactive data visualization and analysis. We will discuss porting challenges and present performance numbers.
  • This paper will present an overview of the current status of I/O on the Cray XT line of supercomputers and provide guidance to application developers and users for achieving efficient I/O. A large amount of I/O benchmark results will be presented, motivated by projected I/O requirements for some widely-used scientific applications in the DOE. Finally, the authors will interpret and summarize these benchmark results to give forward-looking guidance for I/O in large-scale application runs on a Cray XT3/XT4.
  • The Cray XT3 and XT4 have similar architectures, differing primarily in memory performance and in bandwidth between the node and interconnect. This paper evaluates and compares the scalability of the XT3 and XT4. Kernel benchmarks are used to verify and to quantify the performance differences between the systems. Application benchmarks are used to examine the impact of these differences on scalability. Both kernel and application benchmarks are used to identify how to use the systems most efficiently.
  • The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Cray's custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using microbenchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performancemore » of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.« less
  • We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicore-specific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4x when running on dual- and quad-core Opteron dual-socket SMPs. We extend these studies to the distributed memory arena via a hybrid MPI/pthreads implementation. In addition to conventional auto-tuning at the local SMP node, we tune at the message-passing level to determine the optimal aspect ratio as well as the correct balance between MPI tasks and threads permore » MPI task. Our study presents a detailed performance analysis when moving along an isocurve of constant hardware usage: fixed total memory, total cores, and total nodes. Overall, our work points to approaches for improving intra- and inter-node efficiency on large-scale multicore systems for demanding scientific applications.« less