DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optimizing CMS build infrastructure via Apache Mesos

Abstract

The Offline Software of the CMS Experiment at the Large Hadron Collider (LHC) at CERN consists of 6M lines of in-house code, developed over a decade by nearly 1000 physicists, as well as a comparable amount of general use open-source code. A critical ingredient to the success of the construction and early operation of the WLCG was the convergence, around the year 2000, on the use of a homogeneous environment of commodity x86-64 processors and Linux.Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other applications on a dynamically shared pool of nodes. Lastly, we present how we migrated our continuous integration system to schedule jobs on a relatively small Apache Mesos enabled cluster and how this resulted in better resource usage, higher peak performance and lower latency thanks to the dynamic scheduling capabilities of Mesos.

Authors:
 [1];  [2];  [3];  [1];  [4];  [1]
  1. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  2. Univ. di Torino, Torino (Italy)
  3. Princeton Univ., Princeton, NJ (United States)
  4. Univ. de los Andes, Bogota (Colombia)
Publication Date:
Research Org.:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP)
OSTI Identifier:
1346389
Report Number(s):
arXiv:1507.07429; FERMILAB-CONF-15-661-CMS
Journal ID: ISSN 1742-6588; 1385108
Grant/Contract Number:  
AC02-07CH11359
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Physics. Conference Series
Additional Journal Information:
Journal Volume: 664; Journal Issue: 6; Journal ID: ISSN 1742-6588
Publisher:
IOP Publishing
Country of Publication:
United States
Language:
English
Subject:
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS

Citation Formats

Abdurachmanov, David, Degano, Alessandro, Elmer, Peter, Eulisse, Giulio, Mendez, David, and Muzaffar, Shahzad. Optimizing CMS build infrastructure via Apache Mesos. United States: N. p., 2015. Web. doi:10.1088/1742-6596/664/6/062013.
Abdurachmanov, David, Degano, Alessandro, Elmer, Peter, Eulisse, Giulio, Mendez, David, & Muzaffar, Shahzad. Optimizing CMS build infrastructure via Apache Mesos. United States. https://doi.org/10.1088/1742-6596/664/6/062013
Abdurachmanov, David, Degano, Alessandro, Elmer, Peter, Eulisse, Giulio, Mendez, David, and Muzaffar, Shahzad. Wed . "Optimizing CMS build infrastructure via Apache Mesos". United States. https://doi.org/10.1088/1742-6596/664/6/062013. https://www.osti.gov/servlets/purl/1346389.
@article{osti_1346389,
title = {Optimizing CMS build infrastructure via Apache Mesos},
author = {Abdurachmanov, David and Degano, Alessandro and Elmer, Peter and Eulisse, Giulio and Mendez, David and Muzaffar, Shahzad},
abstractNote = {The Offline Software of the CMS Experiment at the Large Hadron Collider (LHC) at CERN consists of 6M lines of in-house code, developed over a decade by nearly 1000 physicists, as well as a comparable amount of general use open-source code. A critical ingredient to the success of the construction and early operation of the WLCG was the convergence, around the year 2000, on the use of a homogeneous environment of commodity x86-64 processors and Linux.Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other applications on a dynamically shared pool of nodes. Lastly, we present how we migrated our continuous integration system to schedule jobs on a relatively small Apache Mesos enabled cluster and how this resulted in better resource usage, higher peak performance and lower latency thanks to the dynamic scheduling capabilities of Mesos.},
doi = {10.1088/1742-6596/664/6/062013},
journal = {Journal of Physics. Conference Series},
number = 6,
volume = 664,
place = {United States},
year = {Wed Dec 23 00:00:00 EST 2015},
month = {Wed Dec 23 00:00:00 EST 2015}
}

Works referenced in this record:

Large-scale cluster management at Google with Borg
conference, April 2015

  • Verma, Abhishek; Pedrosa, Luis; Korupolu, Madhukar
  • Proceedings of the Tenth European Conference on Computer Systems
  • DOI: 10.1145/2741948.2741964

Works referencing / citing this record:

A cloud-agnostic queuing system to support the implementation of deadline-based application execution policies
journal, December 2019