Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

Klimentov, A.; Buncic, P.; De, K.; Jha, S.; Maeno, T.; Mount, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Porter, R. J.; Read, K. F.; Vaniachine, A.; Wells, J. C.; Wenaus, T.

doi:10.1088/1742-6596/608/1/012040

Title: Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

Abstract

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10²) sites, O(10⁵) cores, O(10⁸) jobs per year, O(10³) users, and ATLAS data volume is O(10¹⁷) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). Themore »« less

Authors:

Klimentov, A. ^[1]; Buncic, P. ^[2]; De, K. ^[3]; Jha, S. ^[4]; Maeno, T. ^[1]; Mount, R. ^[5]; Nilsson, P. ^[1]; Oleynik, D. ^[3]; Panitkin, S. ^[1]; Petrosyan, A. ^[3]; Porter, R. J. ^[6]; Read, K. F. ^[7]; Vaniachine, A. ^[8]; Wells, J. C. ^[7]; Wenaus, T. ^[1]

Brookhaven National Lab. (BNL), Upton, NY (United States)
European Organization for Nuclear Research (CERN), Geneva (Switzerland)
Univ. of Texas, Arlington, TX (United States)
Rutgers Univ., Piscataway, NJ (United States)
SLAC National Accelerator Lab., Menlo Park, CA (United States)
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Argonne National Lab. (ANL), Argonne, IL (United States)

Publication Date:: Fri May 22 00:00:00 EDT 2015

Research Org.:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)

Sponsoring Org.:: USDOE Office of Science (SC), High Energy Physics (HEP)

Contributing Org.:: Brookhaven National Lab. (BNL), Upton, NY (United States); Univ. of Texas, Arlington, TX (United States)

OSTI Identifier:: 1265526

Grant/Contract Number:: AC05-00OR22725; AC02-98CH10886; AC02-06CH11357

Resource Type:: Accepted Manuscript

Journal Name:: Journal of Physics. Conference Series

Additional Journal Information:: Journal Volume: 608; Journal Issue: 1; Journal ID: ISSN 1742-6588

Publisher:: IOP Publishing

Country of Publication:: United States

Language:: English

Subject:: 97 MATHEMATICS AND COMPUTING

Citation Formats


                    Klimentov, A., Buncic, P., De, K., Jha, S., Maeno, T., Mount, R., Nilsson, P., Oleynik, D., Panitkin, S., Petrosyan, A., Porter, R. J., Read, K. F., Vaniachine, A., Wells, J. C., and Wenaus, T. Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing.  United States: N. p., 2015. 
Web.  doi:10.1088/1742-6596/608/1/012040.

Copy to clipboard


                    Klimentov, A., Buncic, P., De, K., Jha, S., Maeno, T., Mount, R., Nilsson, P., Oleynik, D., Panitkin, S., Petrosyan, A., Porter, R. J., Read, K. F., Vaniachine, A., Wells, J. C., & Wenaus, T. Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing.  United States.  https://doi.org/10.1088/1742-6596/608/1/012040

Copy to clipboard


                    Klimentov, A., Buncic, P., De, K., Jha, S., Maeno, T., Mount, R., Nilsson, P., Oleynik, D., Panitkin, S., Petrosyan, A., Porter, R. J., Read, K. F., Vaniachine, A., Wells, J. C., and Wenaus, T. Fri .  
"Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing".  United States.  https://doi.org/10.1088/1742-6596/608/1/012040.  https://www.osti.gov/servlets/purl/1265526.

Copy to clipboard


                    
@article{osti_1265526,

  title        = {Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing},

  author       = {Klimentov, A. and Buncic, P. and De, K. and Jha, S. and Maeno, T. and Mount, R. and Nilsson, P. and Oleynik, D. and Panitkin, S. and Petrosyan, A. and Porter, R. J. and Read, K. F. and Vaniachine, A. and Wells, J. C. and Wenaus, T.},

  abstractNote = {The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.},

  doi          = {10.1088/1742-6596/608/1/012040},

  journal      = {Journal of Physics. Conference Series},

  number       = 1,

  volume       = 608,

  place        = {United States},

  year         = {Fri May 22 00:00:00 EDT 2015},

  month        = {Fri May 22 00:00:00 EDT 2015}

}

Copy to clipboard

Journal Article:

Free Publicly Available Full Text

Accepted Manuscript (DOE)

Publisher's Version of Record

https://doi.org/10.1088/1742-6596/608/1/012040

Other availability

Search WorldCat to find libraries that may hold this journal

Citation Metrics:

Cited by: 16 works

Citation information provided by
Web of Science

Save / Share:

Export Metadata

Save to My Library

Works referenced in this record:

PanDA: distributed production and distributed analysis system for ATLAS
journal, July 2008

Maeno, T.
Journal of Physics: Conference Series, Vol. 119, Issue 6
DOI: 10.1088/1742-6596/119/6/062036

Open Science Grid Study of the Coupling between Conformation and Water Content in the Interior of a Protein
journal, October 2008

Damjanović, Ana; Miller, Benjamin T.; Wenaus, Torre J.
Journal of Chemical Information and Modeling, Vol. 48, Issue 10
DOI: 10.1021/ci800263c

The antimatter spectrometer (AMS-02): A particle physics detector in space
journal, April 2008

Battiston, Roberto
Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 588, Issue 1-2
DOI: 10.1016/j.nima.2008.01.044

Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC
journal, September 2012

Aad, G.; Abajyan, T.; Abbott, B.
Physics Letters B, Vol. 716, Issue 1
DOI: 10.1016/j.physletb.2012.08.020

Works referencing / citing this record:

PanDA Workload Management System Meta-data Segmentation
journal, January 2015

Golosova, M.; Grigorieva, M.; Klimentov, A.
Procedia Computer Science, Vol. 66
DOI: 10.1016/j.procs.2015.11.051

Similar Records in DOE PAGES and OSTI.GOV collections:

Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

Conference De, K ; Jha, S ; Klimentov, A ; ...

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore »« less
Full Text Available
Accelerating Science Impact through Big Data Workflow Management and Supercomputing

Journal Article De, K. ; Klimentov, A. ; Maeno, T. ; ... - EPJ Web of Conferences

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed Analysis)Workload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic usemore »« less
Cited by 1
https://doi.org/10.1051/epjconf/201610801003

Full Text Available
Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

Journal Article Klimentov, A. ; De, K. ; Jha, S. ; ... - Journal of Physics. Conference Series

The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a singlemore »« less
https://doi.org/10.1088/1742-6596/762/1/012021

Full Text Available
INTEGRATION OF PANDA WORKLOAD MANAGEMENT SYSTEM WITH SUPERCOMPUTERS

Conference De, K ; Jha, S ; Maeno, T ; ...

Abstract The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the funda- mental nature of matter and the basic forces that shape our universe, and were recently credited for the dis- covery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Datamore »« less
https://doi.org/10.1134/S1547477116050150
Supercomputers, Clouds and Grids powered by BigPanDA for Brain studies

Conference Beche, A. ; De, K. ; Delalondre, F. ; ... - Journal of Physics. Conference Series

The PanDA WMS - Production and Distributed Analysis Workload Management System - has been developed and used by the ATLAS experiment at the LHC (Large Hadron Collider) for all data processing and analysis challenges. BigPanDA is an extension of the PanDA WMS to run ATLAS and non-ATLAS applications on Leadership Class Facilities and supercomputers, as well as traditional grid and cloud resources. The success of the BigPanDA project has drawn attention from other compute intensive sciences such as biology. In 2017, a pilot project was started between BigPanDA and the Blue Brain Project (BBP) of the Ecole Polytechnique Federal demore »« less
https://doi.org/10.1088/1742-6596/1085/3/032003

Full Text Available

Similar Records

Title: Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

Abstract

Citation Formats

PanDA: distributed production and distributed analysis system for ATLAS journal, July 2008

Open Science Grid Study of the Coupling between Conformation and Water Content in the Interior of a Protein journal, October 2008

The antimatter spectrometer (AMS-02): A particle physics detector in space journal, April 2008

Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC journal, September 2012

PanDA Workload Management System Meta-data Segmentation journal, January 2015

PanDA: distributed production and distributed analysis system for ATLAS
journal, July 2008

Open Science Grid Study of the Coupling between Conformation and Water Content in the Interior of a Protein
journal, October 2008

The antimatter spectrometer (AMS-02): A particle physics detector in space
journal, April 2008

Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC
journal, September 2012

PanDA Workload Management System Meta-data Segmentation
journal, January 2015