R&D100: Lightweight Distributed Metric Service
Abstract
On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.
- Authors:
- Publication Date:
- Research Org.:
- Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA)
- OSTI Identifier:
- 1328737
- Resource Type:
- Multimedia
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; HIGH PERFORMANCE COMPUTING SYSTEMS; DATA; COMPUTER RESOURCES; LDMS; NCSA BLUE WATER SUPERCOMPUTER
Citation Formats
Gentile, Ann, Brandt, Jim, Tucker, Tom, and Showerman, Mike. R&D100: Lightweight Distributed Metric Service. United States: N. p., 2015.
Web.
Gentile, Ann, Brandt, Jim, Tucker, Tom, & Showerman, Mike. R&D100: Lightweight Distributed Metric Service. United States.
Gentile, Ann, Brandt, Jim, Tucker, Tom, and Showerman, Mike. Thu .
"R&D100: Lightweight Distributed Metric Service". United States. https://www.osti.gov/servlets/purl/1328737.
@article{osti_1328737,
title = {R&D100: Lightweight Distributed Metric Service},
author = {Gentile, Ann and Brandt, Jim and Tucker, Tom and Showerman, Mike},
abstractNote = {On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Nov 19 00:00:00 EST 2015},
month = {Thu Nov 19 00:00:00 EST 2015}
}