XRootD popularity on hadoop clusters
Abstract
Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.
- Authors:
-
- Univ. of Pisa (Italy); Istituto Nazionale di Fisica Nucleare (INFN), Pisa (Italy)
- Istituto Nazionale di Fisica Nucleare (INFN), Pisa (Italy)
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- European Organization for Nuclear Research (CERN), Geneva (Switzerland)
- European Organization for Nuclear Research (CERN), Geneva (Switzerland); CMS Collaboration, et al.
- Publication Date:
- Research Org.:
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), High Energy Physics (HEP)
- Contributing Org.:
- CMS Collaboration
- OSTI Identifier:
- 1831862
- Report Number(s):
- FERMILAB-PUB-17-715-CMS
Journal ID: ISSN 1742-6588; oai:inspirehep.net:1638557; TRN: US2216651
- Grant/Contract Number:
- AC02-07CH11359
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Physics. Conference Series
- Additional Journal Information:
- Journal Volume: 898; Journal Issue: 7; Journal ID: ISSN 1742-6588
- Publisher:
- IOP Publishing
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Meoni, Marco, Boccali, Tommaso, Magini, Nicolò, Menichetti, Luca, and Giordano, Domenico. XRootD popularity on hadoop clusters. United States: N. p., 2017.
Web. doi:10.1088/1742-6596/898/7/072027.
Meoni, Marco, Boccali, Tommaso, Magini, Nicolò, Menichetti, Luca, & Giordano, Domenico. XRootD popularity on hadoop clusters. United States. https://doi.org/10.1088/1742-6596/898/7/072027
Meoni, Marco, Boccali, Tommaso, Magini, Nicolò, Menichetti, Luca, and Giordano, Domenico. Wed .
"XRootD popularity on hadoop clusters". United States. https://doi.org/10.1088/1742-6596/898/7/072027. https://www.osti.gov/servlets/purl/1831862.
@article{osti_1831862,
title = {XRootD popularity on hadoop clusters},
author = {Meoni, Marco and Boccali, Tommaso and Magini, Nicolò and Menichetti, Luca and Giordano, Domenico},
abstractNote = {Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.},
doi = {10.1088/1742-6596/898/7/072027},
journal = {Journal of Physics. Conference Series},
number = 7,
volume = 898,
place = {United States},
year = {Wed Nov 22 00:00:00 EST 2017},
month = {Wed Nov 22 00:00:00 EST 2017}
}
Works referenced in this record:
The Worldwide LHC Computing Grid (worldwide LCG)
journal, July 2007
- Shiers, Jamie
- Computer Physics Communications, Vol. 177, Issue 1-2
CMS Physics Technical Design Report, Volume II: Physics Performance
journal, April 2007
- Collaboration, The CMS
- Journal of Physics G: Nuclear and Particle Physics, Vol. 34, Issue 6
Works referencing / citing this record:
Dataset Popularity Prediction for Caching of CMS Big Data
journal, February 2018
- Meoni, Marco; Perego, Raffaele; Tonellotto, Nicola
- Journal of Grid Computing, Vol. 16, Issue 2