Grid site availability evaluation and monitoring at CMS
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Vilnius Univ., Vilnius (Lithuania)
- Univ. di Pisa & INFN, Pisa (Italy)
- European Organization for Nuclear Research (CERN), Geneva (Switzerland)
The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provides computing and storage to the CMS computing grid. Over a hundred sites worldwide contribute with resources from hundred to well over ten thousand computing cores and storage from tens of TBytes to tens of PBytes. In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impact data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Furthermore, enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.
- Research Organization:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), High Energy Physics (HEP)
- Grant/Contract Number:
- AC02-07CH11359
- OSTI ID:
- 1415641
- Report Number(s):
- FERMILAB-CONF-16-752-CD; 1638611; TRN: US1800845
- Journal Information:
- Journal of Physics. Conference Series, Vol. 898, Issue 9; ISSN 1742-6588
- Publisher:
- IOP PublishingCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
CMS readiness for multi-core workload scheduling
Monitoring data transfer latency in CMS computing operations