skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: NetLogger: A toolkit for distributed system performance tuning anddeb ugging

Technical Report ·
DOI:https://doi.org/10.2172/924785· OSTI ID:924785

Developers and users of high-performance distributed systemsoften observe performance problems such as unexpectedly low throughput orhigh latency. Determining the source of the performance problems requiresdetailed end-to-end instrumentation of all components, including theapplications, operating systems, hosts, and networks. In this paper wedescribe a methodology that enables the real-time diagnosis ofperformance problems in complex high-performance distributed systems. Themethodology includes tools for generating timestamped event logs that canbe used to provide detailed end-to-end application and system levelmonitoring; and tools for visualizing the log data and real-time state ofthe distributed system. This methodology, called NetLogger, has proveninvaluable for diagnosing problems in networks and in distributed systemscode. This approach is novel in that it combines network, host, andapplication-level monitoring, providing a complete view of the entiresystem. NetLogger is designed to be extremely light-weight, and includesa mechanism for reliably collecting monitoring events from multipledistributed locations. This technical report summarizes most importantpoints of several previous papers on NetLogger, and is meant to be usedas a general overview.

Research Organization:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
DE-AC02-05CH11231
OSTI ID:
924785
Report Number(s):
LBNL-51276; R&D Project: UNKNOWN; BnR: KJ0102000; TRN: US200811%%180
Country of Publication:
United States
Language:
English