skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Toward a Performance/Resilience Tool for Hardware/Software Co-Design of High-Performance Computing Systems

Conference ·
OSTI ID:1107829

xSim is a simulation-based performance investigation toolkit that permits running high-performance computing (HPC) applications in a controlled environment with millions of concurrent execution threads, while observing application performance in a simulated extreme-scale system for hardware/software co-design. The presented work details newly developed features for xSim that permit the injection of MPI process failures, the propagation/detection/notification of such failures within the simulation, and their handling using application-level checkpoint/restart. These new capabilities enable the observation of application behavior and performance under failure within a simulated future-generation HPC system using the most common fault handling technique.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1107829
Resource Relation:
Conference: International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI) 2013, Lyon, France, 20131001, 20131004
Country of Publication:
United States
Language:
English

Similar Records

Related Subjects