skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Node Monitoring

Conference ·
OSTI ID:1048677

Project description is: (1) Build a high performance computer; and (2) Create a tool to monitor node applications in Component Based Tool Framework (CBTF) using code from Lightweight Data Metric Service (LDMS). The importance of this project is that: (1) there is a need a scalable, parallel tool to monitor nodes on clusters; and (2) New LDMS plugins need to be able to be easily added to tool. CBTF stands for Component Based Tool Framework. It's scalable and adjusts to different topologies automatically. It uses MRNet (Multicast/Reduction Network) mechanism for information transport. CBTF is flexible and general enough to be used for any tool that needs to do a task on many nodes. Its components are reusable and 'EASILY' added to a new tool. There are three levels of CBTF: (1) frontend node - interacts with users; (2) filter nodes - filters or concatenates information from backend nodes; and (3) backend nodes - where the actual work of the tool is done. LDMS stands for lightweight data metric servies. It's a tool used for monitoring nodes. Ltool is the name of the tool we derived from LDMS. It's dynamically linked and includes the following components: Vmstat, Meminfo, Procinterrupts and more. It works by: Ltool command is run on the frontend node; Ltool collects information from the backend nodes; backend nodes send information to the filter nodes; and filter nodes concatenate information and send to a database on the front end node. Ltool is a useful tool when it comes to monitoring nodes on a cluster because the overhead involved with running the tool is not particularly high and it will automatically scale to any size cluster.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
DOE/LANL
DOE Contract Number:
AC52-06NA25396
OSTI ID:
1048677
Report Number(s):
LA-UR-12-23629; TRN: US201216%%1081
Resource Relation:
Conference: Computing and Information Technology Student Mini Showcase ; 2012-08-02 - 2012-08-02 ; Los Alamos, New Mexico, United States
Country of Publication:
United States
Language:
English

Similar Records

Lightweight and Statistical Techniques for Petascale PetaScale Debugging
Technical Report · Mon Jun 30 00:00:00 EDT 2014 · OSTI ID:1048677

Development of EMRALD Services in a Fully-Integrated RISMC Platform
Technical Report · Wed Jun 01 00:00:00 EDT 2022 · OSTI ID:1048677

Wearable Sensor Application for Integrated Early Warning and Health Surveillance
Journal Article · Tue May 22 00:00:00 EDT 2018 · Online Journal of Public Health Informatics · OSTI ID:1048677