DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploiting analytics techniques in CMS computing monitoring

Abstract

The CMS experiment has collected an enormous volume of metadata about its computing operations in its monitoring systems, describing its experience in operating all of the CMS workflows on all of the Worldwide LHC Computing Grid Tiers. Data mining efforts into all these information have rarely been done, but are of crucial importance for a better understanding of how CMS did successful operations, and to reach an adequate and adaptive modelling of the CMS operations, in order to allow detailed optimizations and eventually a prediction of system behaviours. These data are now streamed into the CERN Hadoop data cluster for further analysis. Specific sets of information (e.g. data on how many replicas of datasets CMS wrote on disks at WLCG Tiers, data on which datasets were primarily requested for analysis, etc) were collected on Hadoop and processed with MapReduce applications profiting of the parallelization on the Hadoop cluster. We present the implementation of new monitoring applications on Hadoop, and discuss the new possibilities in CMS computing monitoring introduced with the ability to quickly process big data sets from mulltiple sources, looking forward to a predictive modeling of the system.

Authors:
 [1];  [2];  [3];  [4];  [3]
  1. Univ. di Balogna (Italy)
  2. Cornell Univ., Ithaca, NY (United States)
  3. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  4. Univ. of Vilnius (Lithuania)
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP)
OSTI Identifier:
1415640
Report Number(s):
FERMILAB-CONF-16-735-CD
Journal ID: ISSN 1742-6588; 1638624
Grant/Contract Number:  
AC02-07CH11359
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Physics. Conference Series
Additional Journal Information:
Journal Volume: 898; Journal Issue: 9; Conference: 22nd International Conference on Computing in High Energy and Nuclear Physics, San Francisco, CA, 10/10-10/14/2016; Journal ID: ISSN 1742-6588
Publisher:
IOP Publishing
Country of Publication:
United States
Language:
English
Subject:
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS; 97 MATHEMATICS AND COMPUTING

Citation Formats

Bonacorsi, D., Kuznetsov, V., Magini, N., Repečka, A., and Vaandering, E. Exploiting analytics techniques in CMS computing monitoring. United States: N. p., 2017. Web. doi:10.1088/1742-6596/898/9/092030.
Bonacorsi, D., Kuznetsov, V., Magini, N., Repečka, A., & Vaandering, E. Exploiting analytics techniques in CMS computing monitoring. United States. https://doi.org/10.1088/1742-6596/898/9/092030
Bonacorsi, D., Kuznetsov, V., Magini, N., Repečka, A., and Vaandering, E. Wed . "Exploiting analytics techniques in CMS computing monitoring". United States. https://doi.org/10.1088/1742-6596/898/9/092030. https://www.osti.gov/servlets/purl/1415640.
@article{osti_1415640,
title = {Exploiting analytics techniques in CMS computing monitoring},
author = {Bonacorsi, D. and Kuznetsov, V. and Magini, N. and Repečka, A. and Vaandering, E.},
abstractNote = {The CMS experiment has collected an enormous volume of metadata about its computing operations in its monitoring systems, describing its experience in operating all of the CMS workflows on all of the Worldwide LHC Computing Grid Tiers. Data mining efforts into all these information have rarely been done, but are of crucial importance for a better understanding of how CMS did successful operations, and to reach an adequate and adaptive modelling of the CMS operations, in order to allow detailed optimizations and eventually a prediction of system behaviours. These data are now streamed into the CERN Hadoop data cluster for further analysis. Specific sets of information (e.g. data on how many replicas of datasets CMS wrote on disks at WLCG Tiers, data on which datasets were primarily requested for analysis, etc) were collected on Hadoop and processed with MapReduce applications profiting of the parallelization on the Hadoop cluster. We present the implementation of new monitoring applications on Hadoop, and discuss the new possibilities in CMS computing monitoring introduced with the ability to quickly process big data sets from mulltiple sources, looking forward to a predictive modeling of the system.},
doi = {10.1088/1742-6596/898/9/092030},
journal = {Journal of Physics. Conference Series},
number = 9,
volume = 898,
place = {United States},
year = {Wed Nov 22 00:00:00 EST 2017},
month = {Wed Nov 22 00:00:00 EST 2017}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Figures / Tables:

Figure 1 Figure 1: Structure of the CMS dataset block replica monitoring system

Save / Share:
Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.