Integration of Titan supercomputer at OLCF with ATLAS Production System

Barreiro Megino, F.; De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wells, J.; Wenaus, T.

doi:10.1088/1742-6596/898/9/092002

Title: Integration of Titan supercomputer at OLCF with ATLAS Production System

Journal Article · Sun Oct 01 00:00:00 EDT 2017 · Journal of Physics. Conference Series

DOI:https://doi.org/10.1088/1742-6596/898/9/092002· OSTI ID:1567554

Barreiro Megino, F.; De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wells, J.; Wenaus, T.

The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this paper we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job submission to Titan's batch queues and local data management, with lightweight MPI wrappers to run single node workloads in parallel on Titan's multi-core worker nodes. It provides for running of standard ATLAS production jobs on unused resources (backfill) on Titan. The system already allowed ATLAS to collect on Titan millions of core-hours per month, execute hundreds of thousands jobs, while simultaneously improving Titans utilization efficiency. We will discuss the details of the implementation, current experience with running the system, as well as future plans aimed at improvements in scalability and efficiency.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Organization:: USDOE Office of Science (SC)

Contributing Organization:: ATLAS Collaboration

OSTI ID:: 1567554

Journal Information:: Journal of Physics. Conference Series, Vol. 898; ISSN 1742-6588

Publisher:: IOP PublishingCopyright Statement

Country of Publication:: United States

Language:: English

References (3)

The ATLAS PanDA Pilot in Operation Nilsson, P.; Caballero, J.; De, K. Journal of Physics: Conference Series, Vol. 331, Issue 6 https://doi.org/10.1088/1742-6596/331/6/062040	journal	December 2011
Scaling up ATLAS production system for the LHC Run 2 and beyond: project ProdSys2 Borodin, M.; De, K.; Garcia, J. Journal of Physics: Conference Series, Vol. 664, Issue 6 https://doi.org/10.1088/1742-6596/664/6/062005	journal	December 2015
Overview of ATLAS PanDA Workload Management Maeno, T.; De, K.; Wenaus, T. Journal of Physics: Conference Series, Vol. 331, Issue 7 https://doi.org/10.1088/1742-6596/331/7/072024	journal	December 2011

Similar Records

Integration of Titan supercomputer at OLCF with ATLAS Production System

Conference · Sun Oct 01 00:00:00 EDT 2017 · OSTI ID:1567554

Megino, F. Barreiro; De, Kaushik; Jha, Shantenu; +7 more

Integration of PanDA workload management system with Titan supercomputer at OLCF

Journal Article · Wed Dec 23 00:00:00 EST 2015 · Journal of Physics. Conference Series · OSTI ID:1567554

De, K.; Klimentov, A.; Oleynik, D.; +5 more

INTEGRATION OF PANDA WORKLOAD MANAGEMENT SYSTEM WITH SUPERCOMPUTERS

Conference · Fri Jan 01 00:00:00 EST 2016 · OSTI ID:1567554

De, K; Jha, S; Maeno, T; +13 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Integration of Titan supercomputer at OLCF with ATLAS Production System

Citation Formats

References (3)

Similar Records

Related Subjects