Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2

Balcas, J.; Belforte, S.; Bockelman, B.; Gutsche, O.; Khan, F.; Larson, K.; Letts, J.; Mascheroni, M.; Mason, D.; McCrea, A.; Saiz-Santos, M.; Sfiligoi, I.

doi:10.1088/1742-6596/664/6/062030

Title: Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2

Conference · Wed Dec 23 00:00:00 EST 2015 · J.Phys.Conf.Ser.

DOI:https://doi.org/10.1088/1742-6596/664/6/062030· OSTI ID:1247508

Balcas, J. ^[1]; Belforte, S. ^[2]; Bockelman, B. ^[3]; Gutsche, O. ^[4]; Khan, F. ^[5]; Larson, K. ^[4]; Letts, J. ^[6]; Mascheroni, M. ^[7]; Mason, D. ^[4]; McCrea, A. ^[6]; Saiz-Santos, M. ^[6]; Sfiligoi, I. ^[6]

Vilnius U.
Trieste U.
Nebraska U.
Fermilab
Quaid-i-Azam U.
UC, San Diego
Milan Bicocca U.

The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning system. So far we have been running several independent resource pools, but we are working on unifying them all to reduce the operational load and more effectively share resources between various activities in CMS. The major challenge of this unification activity is scale. The combined pool size is expected to reach 200K job slots, which is significantly bigger than any other multi-user HTCondor based system currently in production. To get there we have studied scaling limitations in our existing pools, the biggest of which tops out at about 70K slots, providing valuable feedback to the development communities, who have responded by delivering improvements which have helped us reach higher and higher scales with more stability. We have also worked on improving the organization and support model for this critical service during Run 2 of the LHC. This contribution will present the results of the scale testing and experiences from the first months of running the Global Pool.

View Conference

Cite

Export

Save

Research Organization:: Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)

Sponsoring Organization:: USDOE Office of Science (SC), High Energy Physics (HEP)

DOE Contract Number:: AC02-07CH11359

OSTI ID:: 1247508

Report Number(s):: FERMILAB-CONF-15-604-CD; 1413951

Journal Information:: J.Phys.Conf.Ser., Vol. 664, Issue 6; Conference: 21st International Conference on Computing in High Energy and Nuclear Physics, Okinawa, Japan, 04/13-04/17/2015

Country of Publication:: United States

Language:: English

Similar Records

Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits

Journal Article · Wed Nov 22 00:00:00 EST 2017 · Journal of Physics. Conference Series · OSTI ID:1247508

Balcas, J.; Bockelman, B.; Hufnagel, D.; +9 more

Exploring GlideinWMS and HTCondor scalability frontiers for an expanding CMS Global Pool

Journal Article · Tue Sep 17 00:00:00 EDT 2019 · EPJ Web of Conferences · OSTI ID:1247508

Pérez-Calero Yzquierdo, Antonio; Bockelman, Brian Paul; Davila Foyo, Diego; +8 more

Effective HTCondor-based monitoring system for CMS

Journal Article · Tue Nov 21 00:00:00 EST 2017 · Journal of Physics. Conference Series · OSTI ID:1247508

Balcas, J.; Bockelman, B. P.; Da Silva, J. M.; +7 more

Title: Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2

Citation Formats

Similar Records

Related Subjects