Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits
- California Inst. of Technology (CalTech), Pasadena, CA (United States)
- Univ. of Nebraska, Lincoln, NE (United States)
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Univ. of Notre Dame, IN (United States)
- Quaid-I-Azam Univ., Islamabad (Pakistan)
- Univ. of California, San Diego, CA (United States)
- Univ. Estadual Paulista, São Paolo (Brazil)
- Port d'Informació Científica, Barcelona (Spain); Research Centre for Energy, Environment and Technology (CIEMAT), Madrid (Spain)
The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. The total resources at Tier-1 and Tier-2 grid sites pledged to CMS exceed 100,000 CPU cores, while another 50,000 to 100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. These resources are becoming more diverse in their accessibility and configuration over time. Furthermore, the challenge of stably running at higher and higher scales while introducing new modes of operation such as multi-core pilots, as well as the chaotic nature of physics analysis workflows, places huge strains on the submission infrastructure. This paper details some of the most important challenges to scalability and stability that the CMS Global Pool has faced since the beginning of the LHC Run II and how they were overcome.
- Research Organization:
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), High Energy Physics (HEP)
- Grant/Contract Number:
- AC02-07CH11359
- OSTI ID:
- 1420915
- Report Number(s):
- FERMILAB-CONF-16-754-CD; 1638488
- Journal Information:
- Journal of Physics. Conference Series, Vol. 898, Issue 5; Conference: 22nd International Conference on Computing in High Energy and Nuclear Physics, San Francisco, CA, 10/10-10/14/2016; ISSN 1742-6588
- Publisher:
- IOP PublishingCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Exploring GlideinWMS and HTCondor scalability frontiers for an expanding CMS Global Pool
Producing Madgraph5_aMC@NLO gridpacks and using TensorFlow GPU resources in the CMS HTCondor Global Pool