The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC
Abstract
The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate $$\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.
- Authors:
-
- Cornell Univ., Ithaca, NY (United States)
- Heidelberg Univ. (Germany)
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Publication Date:
- Research Org.:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), High Energy Physics (HEP)
- OSTI Identifier:
- 1437402
- Report Number(s):
- arXiv:1801.03872; FERMILAB-PUB-18-074-CD
Journal ID: ISSN 2510-2036; 1647570; TRN: US1900324
- Grant/Contract Number:
- AC02-07CH11359
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Computing and Software for Big Science
- Additional Journal Information:
- Journal Volume: 2; Journal Issue: 1; Journal ID: ISSN 2510-2036
- Publisher:
- Springer
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS
Citation Formats
Kuznetsov, Valentin, Fischer, Nils Leif, and Guo, Yuyi. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC. United States: N. p., 2018.
Web. doi:10.1007/s41781-018-0005-0.
Kuznetsov, Valentin, Fischer, Nils Leif, & Guo, Yuyi. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC. United States. https://doi.org/10.1007/s41781-018-0005-0
Kuznetsov, Valentin, Fischer, Nils Leif, and Guo, Yuyi. Mon .
"The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC". United States. https://doi.org/10.1007/s41781-018-0005-0. https://www.osti.gov/servlets/purl/1437402.
@article{osti_1437402,
title = {The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC},
author = {Kuznetsov, Valentin and Fischer, Nils Leif and Guo, Yuyi},
abstractNote = {The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate $\mathcal{O}$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.},
doi = {10.1007/s41781-018-0005-0},
journal = {Computing and Software for Big Science},
number = 1,
volume = 2,
place = {United States},
year = {Mon Mar 19 00:00:00 EDT 2018},
month = {Mon Mar 19 00:00:00 EDT 2018}
}
Figures / Tables:
Works referenced in this record:
Using the glideinWMS System as a Common Resource Provisioning Layer in CMS
journal, December 2015
- Balcas, J.; Belforte, S.; Bockelman, B.
- Journal of Physics: Conference Series, Vol. 664, Issue 6
The CMS Data Management System
journal, June 2014
- Giffels, M.; Guo, Y.; Kuznetsov, V.
- Journal of Physics: Conference Series, Vol. 513, Issue 4
The Pilot Way to Grid Resources Using glideinWMS
conference, March 2009
- Sfiligoi, Igor; Bradley, Daniel C.; Holzman, Burt
- 2009 WRI World Congress on Computer Science and Information Engineering
CMS computing operations during run 1
journal, June 2014
- Adelman, J.; Alderweireldt, S.; Artieda, J.
- Journal of Physics: Conference Series, Vol. 513, Issue 3
Distributed computing in practice: the Condor experience
journal, January 2005
- Thain, Douglas; Tannenbaum, Todd; Livny, Miron
- Concurrency and Computation: Practice and Experience, Vol. 17, Issue 2-4, p. 323-356
CMS computing operations during run 1
text, January 2014
- Adelman, J.; Alderweireldt, S.; Artieda, J.
- Karlsruhe
Figures / Tables found in this record: