Tracing Service for Tracing-Driven Glidein Optimizations
- Fermilab
Glideins, also known as pilots, play a pivotal role in the GlideinWMS framework: they provide tailored execution environments for user jobs to run on diverse, complex resources in a distributed setting. The framework includes several heuristics that determine the behavior of a Glidein such as how long to wait for new jobs, the wait time before a Glidein terminates etc., However, being aware of resources utilized and time exhausted during a Glidein failure and resubmission, for example, can be invaluable for its optimization. Our idea is to add a tracing service to the Glidein that will provide a closer observation of the end-to-end progress of a Glidein’s milestones and facilitates a better understanding of the overall framework as well as its reliability. The tracing service will not only gather more information about the Glidein itself to make way for optimizations but also allow user jobs to do the same.
- Research Organization:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
- DOE Contract Number:
- AC02-07CH11359
- OSTI ID:
- 2005212
- Report Number(s):
- FERMILAB-SLIDES-23-167-CSAID; oai:inspirehep.net:2701369
- Country of Publication:
- United States
- Language:
- English
Similar Records
Creating Unit Tests for GlideinWMS using AI tools
Archival, anonymization and presentation of HTCondor logs with GlideinMonitor
Flexible Pilot Jobs Framework for Distributed High Throughput Computing
Conference
·
Mon Jul 29 00:00:00 EDT 2024
·
OSTI ID:2407089
Archival, anonymization and presentation of HTCondor logs with GlideinMonitor
Conference
·
Mon Feb 22 23:00:00 EST 2021
·
OSTI ID:1778683
Flexible Pilot Jobs Framework for Distributed High Throughput Computing
Technical Report
·
Thu Oct 02 00:00:00 EDT 2025
·
OSTI ID:2998402