Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Distributed Machine Learning Workflow with PanDA and iDDS in LHC ATLAS

Journal Article · · EPJ Web of Conferences (Online)
 [1];  [1];  [2];  [1];  [1];  [3];  [3];  [3];  [1];  [1];  [4];  [3];  [1];  [1];  [1]
  1. Brookhaven National Laboratory (BNL), Upton, NY (United States)
  2. Univ. of Wisconsin, Madison, WI (United States)
  3. Univ. of Texas, Arlington, TX (United States)
  4. Univ. of Pittsburgh, PA (United States)
Machine Learning (ML) has become one of the important tools for High Energy Physics analysis. As the size of the dataset increases at the Large Hadron Collider (LHC), and at the same time the search spaces become bigger and bigger in order to exploit the physics potentials, more and more computing resources are required for processing these ML tasks. In addition, complex advanced ML workflows are developed in which one task may depend on the results of previous tasks. How to make use of vast distributed CPUs/GPUs in WLCG for these big complex ML tasks has become a popular research area. In this paper, we present our efforts enabling the execution of distributed ML workflows on the Production and Distributed Analysis (PanDA) system and intelligent Data Delivery Service (iDDS). First, we describe how PanDA and iDDS deal with large-scale ML workflows, including the implementation to process workloads on diverse and geographically distributed computing resources. Next, we report real-world use cases, such as HyperParameter Optimization, Monte Carlo Toy confidence limits calculation, and Active Learning. Finally, we conclude with future plans.
Research Organization:
Brookhaven National Laboratory (BNL), Upton, NY (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP)
Grant/Contract Number:
SC0012704
OSTI ID:
2428916
Report Number(s):
BNL--225899-2024-JAAM
Journal Information:
EPJ Web of Conferences (Online), Journal Name: EPJ Web of Conferences (Online) Vol. 295; ISSN 2100-014X
Publisher:
EDP SciencesCopyright Statement
Country of Publication:
United States
Language:
English

References (5)

Towards an Intelligent Data Delivery Service journal January 2020
An intelligent Data Delivery Service for and beyond the ATLAS experiment journal January 2021
Training and Serving ML workloads with Kubeflow at CERN journal January 2021
The Fast Simulation Chain in the ATLAS experiment journal January 2021
LSST: From Science Drivers to Reference Design and Anticipated Data Products journal March 2019

Similar Records

iDDS: intelligent distributed dispatch and scheduling for workflow orchestration
Journal Article · Fri Jan 23 19:00:00 EST 2026 · European Physical Journal. C, Particles and Fields (Online) · OSTI ID:3017617

Utilizing Distributed Heterogeneous Computing with PanDA in ATLAS
Journal Article · Sun May 05 20:00:00 EDT 2024 · EPJ Web of Conferences (Online) · OSTI ID:2448346

Modular performance prediction for scientific workflows using Machine Learning
Journal Article · Sun May 10 20:00:00 EDT 2020 · Future Generations Computer Systems · OSTI ID:1851724

Related Subjects