skip to main content

DOE PAGESDOE PAGES

Title: A new deadlock resolution protocol and message matching algorithm for the extreme-scale simulator

Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and (2)~a new simulated MPI message matching algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. Furthermore, with highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.
Authors:
 [1] ;  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Grant/Contract Number:
AC05-00OR22725; De-AC05-00OR22725; no. DE-AC05-00OR22725
Type:
Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 28; Journal Issue: 12; Journal ID: ISSN 1532-0626
Publisher:
Wiley
Research Org:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org:
USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE Office of Science (SC)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; performance prediction; message passing interface; parallel discrete event simulation; high-performance computing
OSTI Identifier:
1286913
Alternate Identifier(s):
OSTI ID: 1401192