skip to main content

DOE PAGESDOE PAGES

Title: An evaluation of the state of time synchronization on leadership class supercomputers

We present a detailed examination of time agreement characteristics for nodes within extreme-scale parallel computers. Using a software tool we introduce in this paper, we quantify attributes of clock skew among nodes in three representative high-performance computers sited at three national laboratories. Our measurements detail the statistical properties of time agreement among nodes and how time agreement drifts over typical application execution durations. We discuss the implications of our measurements, why the current state of the field is inadequate, and propose strategies to address observed shortcomings.
Authors:
ORCiD logo [1] ;  [1] ;  [1] ;  [2] ;  [3]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computer Science and Mathematics Division
  2. Univ. Autonoma de Occidente, Cali (Colombia)
  3. Univ. of New Mexico, Albuquerque, NM (United States). Dept. of Computer Science
Publication Date:
Grant/Contract Number:
AC05-00OR22725; AC02-06CH11357; AC02-05CH11231
Type:
Accepted Manuscript
Journal Name:
Concurrency and Computation. Practice and Experience
Additional Journal Information:
Journal Volume: 30; Journal Issue: 4; Journal ID: ISSN 1532-0626
Publisher:
Wiley
Research Org:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; clock synchronization; large-scale systems; system software; time service
OSTI Identifier:
1432152