skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Monitoring Extreme-scale Lustre Toolkit

Abstract

We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in-depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.

Authors:
 [1];  [1]
  1. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Org.:
Work for Others (WFO)
OSTI Identifier:
1185971
DOE Contract Number:
DE-AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: International Workshop on the Lustre Ecosystem: Challenges and Opportunities, Annapolis, MD, USA, 20150303, 20150304
Country of Publication:
United States
Language:
English
Subject:
Lustre; performance monitoring; overlay network; data aggregation

Citation Formats

Brim, Michael J, and Lothian, Josh. Monitoring Extreme-scale Lustre Toolkit. United States: N. p., 2015. Web.
Brim, Michael J, & Lothian, Josh. Monitoring Extreme-scale Lustre Toolkit. United States.
Brim, Michael J, and Lothian, Josh. Thu . "Monitoring Extreme-scale Lustre Toolkit". United States. doi:.
@article{osti_1185971,
title = {Monitoring Extreme-scale Lustre Toolkit},
author = {Brim, Michael J and Lothian, Josh},
abstractNote = {We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in-depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Thu Jan 01 00:00:00 EST 2015},
month = {Thu Jan 01 00:00:00 EST 2015}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: