skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: One network metric datastore to track them all: the OSG network metric service

Journal Article · · Journal of Physics. Conference Series
 [1];  [2];  [3];  [1];  [1];  [1];  [1];  [4];  [1];  [1]
  1. Indiana Univ., Bloomington, IN (United States)
  2. European Organization for Nuclear Research (CERN), Geneva (Switzerland)
  3. University of California San Diego, La Jolla, CA (United States)
  4. Univ. of Michigan, Ann Arbor, MI (United States)

The Open Science Grid (OSG) relies upon the network as a critical part of the distributed infrastructures it enables. In 2012, OSG added a new focus area in networking with a goal of becoming the primary source of network information for its members and collaborators. This includes gathering, organizing, and providing network metrics to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion, and traffic routing. In September of 2015, this service was deployed into the OSG production environment. We will report on the creation, implementation, testing, and deployment of the OSG Networking Service. Starting from organizing the deployment of perfSONAR toolkits within OSG and its partners, to the challenges of orchestrating regular testing between sites, to reliably gathering the resulting network metrics and making them available for users, virtual organizations, and higher level services, all aspects of implementation will be reviewed. In particular, several higher-level services were developed to bring the OSG network service to its full potential. These include a web-based mesh configuration system, which allows central scheduling and management of all the network tests performed by the instances; a set of probes to continually gather metrics from the remote instances and publish it to different sources; a central network datastore (esmond), which provides interfaces to access the network monitoring information in close to real time and historically (up to a year) giving the state of the tests; and a perfSONAR infrastructure monitor system, ensuring the current perfSONAR instances are correctly configured and operating as intended. We will also describe the challenges we encountered in ongoing operations of the network service and how we have evolved our procedures to address those challenges. Finally we will describe our plans for future extensions and improvements to the service.

Research Organization:
Univ. of Michigan, Ann Arbor, MI (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP); National Science Foundation (NSF)
Grant/Contract Number:
SC0007859; 1148698
OSTI ID:
1638834
Journal Information:
Journal of Physics. Conference Series, Vol. 898; ISSN 1742-6588
Publisher:
IOP PublishingCopyright Statement
Country of Publication:
United States
Language:
English

References (2)

Distributed computing in practice: the Condor experience
  • Thain, Douglas; Tannenbaum, Todd; Livny, Miron
  • Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356 https://doi.org/10.1002/cpe.938
journal January 2005
LHCOPN and LHCONE: Status and Future Evolution journal December 2015

Figures / Tables (2)


Similar Records

ASCR Science Network Requirements
Technical Report · Mon Aug 24 00:00:00 EDT 2009 · OSTI ID:1638834

Networks in ATLAS
Journal Article · Sun Oct 01 00:00:00 EDT 2017 · Journal of Physics. Conference Series · OSTI ID:1638834

SDN for End-to-end Networked Science at the Exascale (SENSE) - Final Technical Report
Technical Report · Mon Dec 02 00:00:00 EST 2019 · OSTI ID:1638834