Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

XRootD popularity on hadoop clusters

Journal Article · · Journal of Physics. Conference Series
 [1];  [2];  [3];  [4];  [5]
  1. Univ. of Pisa (Italy); Istituto Nazionale di Fisica Nucleare (INFN), Pisa (Italy)
  2. Istituto Nazionale di Fisica Nucleare (INFN), Pisa (Italy)
  3. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  4. European Organization for Nuclear Research (CERN), Geneva (Switzerland)
  5. European Organization for Nuclear Research (CERN), Geneva (Switzerland); CMS Collaboration, et al.
Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.
Research Organization:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP)
Contributing Organization:
CMS Collaboration
Grant/Contract Number:
AC02-07CH11359
OSTI ID:
1831862
Report Number(s):
FERMILAB-PUB--17-715-CMS; oai:inspirehep.net:1638557
Journal Information:
Journal of Physics. Conference Series, Journal Name: Journal of Physics. Conference Series Journal Issue: 7 Vol. 898; ISSN 1742-6588
Publisher:
IOP PublishingCopyright Statement
Country of Publication:
United States
Language:
English

References (2)

The Worldwide LHC Computing Grid (worldwide LCG) journal July 2007
CMS Physics Technical Design Report, Volume II: Physics Performance journal April 2007

Cited By (1)

Dataset Popularity Prediction for Caching of CMS Big Data journal February 2018

Similar Records

The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC
Journal Article · Sun Mar 18 20:00:00 EDT 2018 · Computing and Software for Big Science · OSTI ID:1437402

YARNsim: Simulating Hadoop YARN
Conference · Wed Dec 31 23:00:00 EST 2014 · OSTI ID:1335904

Selective Sampling for Sensor Type Classification in Buildings
Conference · Tue Jun 09 00:00:00 EDT 2020 · 2020 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN) · OSTI ID:1822657

Related Subjects