Optimizing CMS build infrastructure via Apache Mesos
Abstract
The Offline Software of the CMS Experiment at the Large Hadron Collider (LHC) at CERN consists of 6M lines of in-house code, developed over a decade by nearly 1000 physicists, as well as a comparable amount of general use open-source code. A critical ingredient to the success of the construction and early operation of the WLCG was the convergence, around the year 2000, on the use of a homogeneous environment of commodity x86-64 processors and Linux.Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other applications on a dynamically shared pool of nodes. Lastly, we present how we migrated our continuous integration system to schedule jobs on a relatively small Apache Mesos enabled cluster and how this resulted in better resource usage, higher peak performance and lower latency thanks to the dynamic scheduling capabilities of Mesos.
- Authors:
-
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Univ. di Torino, Torino (Italy)
- Princeton Univ., Princeton, NJ (United States)
- Univ. de los Andes, Bogota (Colombia)
- Publication Date:
- Research Org.:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), High Energy Physics (HEP)
- OSTI Identifier:
- 1346389
- Report Number(s):
- arXiv:1507.07429; FERMILAB-CONF-15-661-CMS
Journal ID: ISSN 1742-6588; 1385108
- Grant/Contract Number:
- AC02-07CH11359
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Journal of Physics. Conference Series
- Additional Journal Information:
- Journal Volume: 664; Journal Issue: 6; Journal ID: ISSN 1742-6588
- Publisher:
- IOP Publishing
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS
Citation Formats
Abdurachmanov, David, Degano, Alessandro, Elmer, Peter, Eulisse, Giulio, Mendez, David, and Muzaffar, Shahzad. Optimizing CMS build infrastructure via Apache Mesos. United States: N. p., 2015.
Web. doi:10.1088/1742-6596/664/6/062013.
Abdurachmanov, David, Degano, Alessandro, Elmer, Peter, Eulisse, Giulio, Mendez, David, & Muzaffar, Shahzad. Optimizing CMS build infrastructure via Apache Mesos. United States. https://doi.org/10.1088/1742-6596/664/6/062013
Abdurachmanov, David, Degano, Alessandro, Elmer, Peter, Eulisse, Giulio, Mendez, David, and Muzaffar, Shahzad. Wed .
"Optimizing CMS build infrastructure via Apache Mesos". United States. https://doi.org/10.1088/1742-6596/664/6/062013. https://www.osti.gov/servlets/purl/1346389.
@article{osti_1346389,
title = {Optimizing CMS build infrastructure via Apache Mesos},
author = {Abdurachmanov, David and Degano, Alessandro and Elmer, Peter and Eulisse, Giulio and Mendez, David and Muzaffar, Shahzad},
abstractNote = {The Offline Software of the CMS Experiment at the Large Hadron Collider (LHC) at CERN consists of 6M lines of in-house code, developed over a decade by nearly 1000 physicists, as well as a comparable amount of general use open-source code. A critical ingredient to the success of the construction and early operation of the WLCG was the convergence, around the year 2000, on the use of a homogeneous environment of commodity x86-64 processors and Linux.Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, Jenkins, Spark, Aurora, and other applications on a dynamically shared pool of nodes. Lastly, we present how we migrated our continuous integration system to schedule jobs on a relatively small Apache Mesos enabled cluster and how this resulted in better resource usage, higher peak performance and lower latency thanks to the dynamic scheduling capabilities of Mesos.},
doi = {10.1088/1742-6596/664/6/062013},
journal = {Journal of Physics. Conference Series},
number = 6,
volume = 664,
place = {United States},
year = {Wed Dec 23 00:00:00 EST 2015},
month = {Wed Dec 23 00:00:00 EST 2015}
}
Works referenced in this record:
Large-scale cluster management at Google with Borg
conference, April 2015
- Verma, Abhishek; Pedrosa, Luis; Korupolu, Madhukar
- Proceedings of the Tenth European Conference on Computer Systems
Works referencing / citing this record:
A cloud-agnostic queuing system to support the implementation of deadline-based application execution policies
journal, December 2019
- Kiss, Tamas; DesLauriers, James; Gesmier, Gregoire
- Future Generation Computer Systems, Vol. 101