Monitoring Extreme-scale Lustre Toolkit
Abstract
We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in-depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.
- Authors:
-
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Publication Date:
- Research Org.:
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Sponsoring Org.:
- Work for Others (WFO)
- OSTI Identifier:
- 1185971
- DOE Contract Number:
- DE-AC05-00OR22725
- Resource Type:
- Conference
- Resource Relation:
- Conference: International Workshop on the Lustre Ecosystem: Challenges and Opportunities, Annapolis, MD, USA, 20150303, 20150304
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Lustre; performance monitoring; overlay network; data aggregation
Citation Formats
Brim, Michael J, and Lothian, Josh. Monitoring Extreme-scale Lustre Toolkit. United States: N. p., 2015.
Web.
Brim, Michael J, & Lothian, Josh. Monitoring Extreme-scale Lustre Toolkit. United States.
Brim, Michael J, and Lothian, Josh. 2015.
"Monitoring Extreme-scale Lustre Toolkit". United States. https://www.osti.gov/servlets/purl/1185971.
@article{osti_1185971,
title = {Monitoring Extreme-scale Lustre Toolkit},
author = {Brim, Michael J and Lothian, Josh},
abstractNote = {We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in-depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.},
doi = {},
url = {https://www.osti.gov/biblio/1185971},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}