skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database

Abstract

AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses a NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the designmore » choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.« less

Authors:
 [1];  [2];  [3];  [4]; ORCiD logo [3];  [5];  [6];  [7];  [8];  [9]
  1. INFN, Perugia
  2. Caltech
  3. Fermilab
  4. Vilnius U.
  5. CERN
  6. Sao Paulo, IFT
  7. Madrid, CIEMAT
  8. INFN, Trieste
  9. Sofiya U.
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1420917
Report Number(s):
FERMILAB-CONF-16-756-CD
1638458
DOE Contract Number:
AC02-07CH11359
Resource Type:
Conference
Resource Relation:
Journal Name: J.Phys.Conf.Ser.; Journal Volume: 898; Journal Issue: 4; Conference: 22nd International Conference on Computing in High Energy and Nuclear Physics, San Francisco, CA, 10/10-10/14/2016
Country of Publication:
United States
Language:
English

Citation Formats

Ciangottini, D., Balcas, J., Mascheroni, M., Rupeika, E. A., Vaandering, E., Riahi, H., Silva, J. M.D., Hernandez, J. M., Belforte, S., and Ivanov, T. T. A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database. United States: N. p., 2017. Web. doi:10.1088/1742-6596/898/4/042048.
Ciangottini, D., Balcas, J., Mascheroni, M., Rupeika, E. A., Vaandering, E., Riahi, H., Silva, J. M.D., Hernandez, J. M., Belforte, S., & Ivanov, T. T. A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database. United States. doi:10.1088/1742-6596/898/4/042048.
Ciangottini, D., Balcas, J., Mascheroni, M., Rupeika, E. A., Vaandering, E., Riahi, H., Silva, J. M.D., Hernandez, J. M., Belforte, S., and Ivanov, T. T. Wed . "A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database". United States. doi:10.1088/1742-6596/898/4/042048. https://www.osti.gov/servlets/purl/1420917.
@article{osti_1420917,
title = {A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database},
author = {Ciangottini, D. and Balcas, J. and Mascheroni, M. and Rupeika, E. A. and Vaandering, E. and Riahi, H. and Silva, J. M.D. and Hernandez, J. M. and Belforte, S. and Ivanov, T. T.},
abstractNote = {AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses a NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.},
doi = {10.1088/1742-6596/898/4/042048},
journal = {J.Phys.Conf.Ser.},
number = 4,
volume = 898,
place = {United States},
year = {Wed Nov 22 00:00:00 EST 2017},
month = {Wed Nov 22 00:00:00 EST 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: