DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Grid site availability evaluation and monitoring at CMS

Journal Article · · Journal of Physics. Conference Series
 [1];  [2];  [3]; ORCiD logo [1];  [4]
  1. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  2. Vilnius Univ., Vilnius (Lithuania)
  3. Univ. di Pisa & INFN, Pisa (Italy)
  4. European Organization for Nuclear Research (CERN), Geneva (Switzerland)

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provides computing and storage to the CMS computing grid. Over a hundred sites worldwide contribute with resources from hundred to well over ten thousand computing cores and storage from tens of TBytes to tens of PBytes. In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impact data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Furthermore, enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.

Research Organization:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP)
Grant/Contract Number:
AC02-07CH11359
OSTI ID:
1415641
Report Number(s):
FERMILAB-CONF-16-752-CD; 1638611; TRN: US1800845
Journal Information:
Journal of Physics. Conference Series, Vol. 898, Issue 9; ISSN 1742-6588
Publisher:
IOP PublishingCopyright Statement
Country of Publication:
United States
Language:
English

Similar Records

Experience with dynamic resource provisioning of the CMS online cluster using a cloud overlay
Journal Article · 2019 · EPJ Web of Conferences · OSTI ID:1574987

CMS readiness for multi-core workload scheduling
Journal Article · 2017 · Journal of Physics. Conference Series · OSTI ID:1420916

Monitoring data transfer latency in CMS computing operations
Journal Article · 2015 · Journal of Physics. Conference Series · OSTI ID:1346387

Related Subjects