Scalable Node Monitoring
Abstract
Project description is: (1) Build a high performance computer; and (2) Create a tool to monitor node applications in Component Based Tool Framework (CBTF) using code from Lightweight Data Metric Service (LDMS). The importance of this project is that: (1) there is a need a scalable, parallel tool to monitor nodes on clusters; and (2) New LDMS plugins need to be able to be easily added to tool. CBTF stands for Component Based Tool Framework. It's scalable and adjusts to different topologies automatically. It uses MRNet (Multicast/Reduction Network) mechanism for information transport. CBTF is flexible and general enough to be used for any tool that needs to do a task on many nodes. Its components are reusable and 'EASILY' added to a new tool. There are three levels of CBTF: (1) frontend node - interacts with users; (2) filter nodes - filters or concatenates information from backend nodes; and (3) backend nodes - where the actual work of the tool is done. LDMS stands for lightweight data metric servies. It's a tool used for monitoring nodes. Ltool is the name of the tool we derived from LDMS. It's dynamically linked and includes the following components: Vmstat, Meminfo, Procinterrupts and more.more »
- Authors:
-
- Los Alamos National Laboratory
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- DOE/LANL
- OSTI Identifier:
- 1048677
- Report Number(s):
- LA-UR-12-23629
TRN: US201216%%1081
- DOE Contract Number:
- AC52-06NA25396
- Resource Type:
- Conference
- Resource Relation:
- Conference: Computing and Information Technology Student Mini Showcase ; 2012-08-02 - 2012-08-02 ; Los Alamos, New Mexico, United States
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; METRICS; MONITORING; MONITORS; PERFORMANCE; TRANSPORT; COMPUTERS
Citation Formats
Drotar, Alexander P, Quinn, Erin E, and Sutherland, Landon D. Scalable Node Monitoring. United States: N. p., 2012.
Web.
Drotar, Alexander P, Quinn, Erin E, & Sutherland, Landon D. Scalable Node Monitoring. United States.
Drotar, Alexander P, Quinn, Erin E, and Sutherland, Landon D. 2012.
"Scalable Node Monitoring". United States. https://www.osti.gov/servlets/purl/1048677.
@article{osti_1048677,
title = {Scalable Node Monitoring},
author = {Drotar, Alexander P and Quinn, Erin E and Sutherland, Landon D},
abstractNote = {Project description is: (1) Build a high performance computer; and (2) Create a tool to monitor node applications in Component Based Tool Framework (CBTF) using code from Lightweight Data Metric Service (LDMS). The importance of this project is that: (1) there is a need a scalable, parallel tool to monitor nodes on clusters; and (2) New LDMS plugins need to be able to be easily added to tool. CBTF stands for Component Based Tool Framework. It's scalable and adjusts to different topologies automatically. It uses MRNet (Multicast/Reduction Network) mechanism for information transport. CBTF is flexible and general enough to be used for any tool that needs to do a task on many nodes. Its components are reusable and 'EASILY' added to a new tool. There are three levels of CBTF: (1) frontend node - interacts with users; (2) filter nodes - filters or concatenates information from backend nodes; and (3) backend nodes - where the actual work of the tool is done. LDMS stands for lightweight data metric servies. It's a tool used for monitoring nodes. Ltool is the name of the tool we derived from LDMS. It's dynamically linked and includes the following components: Vmstat, Meminfo, Procinterrupts and more. It works by: Ltool command is run on the frontend node; Ltool collects information from the backend nodes; backend nodes send information to the filter nodes; and filter nodes concatenate information and send to a database on the front end node. Ltool is a useful tool when it comes to monitoring nodes on a cluster because the overhead involved with running the tool is not particularly high and it will automatically scale to any size cluster.},
doi = {},
url = {https://www.osti.gov/biblio/1048677},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jul 30 00:00:00 EDT 2012},
month = {Mon Jul 30 00:00:00 EDT 2012}
}