Using Apptainer in a Pilot-based Distributed Workload
- Fermilab
GlideinWMS is a pilot and pressure-based workload manager for distributed scientific computing. Many experiments like CMS and Fermilab’s Neutrino experiments use it to provision elastic clusters for their analysis and simulations, split into close to a million concurrent jobs. Most user jobs require containers, and the pilots use Apptainer to set up the desired platform. For the pilots that run as regular batch jobs, Apptainer is safer, lighter, and easier to use than other containerization solutions. Many images used by the pilots are expanded SIF images distributed via the CernVM-FS: this combination is very efficient. At Fermilab, for example, we store on GitHub Dockerfiles that mimic the platform in the worker nodes of local clusters. GitHub workflows build and push the images to Docker Hub, and a service periodically pulls and converts them to the expanded SIF images in the CernVM-FS, so the scientists can find a familiar environment everywhere. Apptainer has also been used to run services inside the pilot jobs, like benchmarks that characterize the worker node being used, or a Triton Inference Server that allows sharing a GPU with all the jobs that run in parallel on a node.
- Research Organization:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- US Department of Energy
- DOE Contract Number:
- 89243024CSC000002
- OSTI ID:
- 2569178
- Report Number(s):
- FERMILAB-SLIDES-25-0090-CSAID; oai:inspirehep.net:2932492
- Conference Information:
- Journal Name: No journal information
- Country of Publication:
- United States
- Language:
- English
Similar Records
Using Pilot Jobs and CernVM File System for Simplified Use of Containers and Software Distribution
Creating Apptainer Workflows with Docker-Compose-like Utilities
Flexible Pilot Jobs Framework for Distributed High Throughput Computing
Conference
·
Thu Dec 31 23:00:00 EST 2020
· No journal information
·
OSTI ID:1824852
Creating Apptainer Workflows with Docker-Compose-like Utilities
Conference
·
Sun May 04 20:00:00 EDT 2025
·
OSTI ID:3029081
Flexible Pilot Jobs Framework for Distributed High Throughput Computing
Technical Report
·
Thu Oct 02 00:00:00 EDT 2025
·
OSTI ID:2998402