skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: High-Throughput Computing on High-Performance Platforms: A Case Study

Abstract

The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan -- a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.

Authors:
 [1];  [2];  [3];  [3]; ORCiD logo [4];  [1];  [2]; ORCiD logo [4];  [3]
  1. University of Texas at Arlington
  2. Brookhaven National Laboratory (BNL)
  3. Rutgers University
  4. ORNL
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) (SC-21)
OSTI Identifier:
1410207
DOE Contract Number:
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: IEEE International Conference on eScience - AUCKLAND, , New Zealand - 10/24/2017 4:00:00 AM-10/27/2017 4:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Oleynik, D, Panitkin, S, Matteo, Turilli, Angius, Alessio, Oral, H Sarp, De, K, Klimentov, A, Wells, Jack C., and Jha, S. High-Throughput Computing on High-Performance Platforms: A Case Study. United States: N. p., 2017. Web. doi:10.1109/eScience.2017.43.
Oleynik, D, Panitkin, S, Matteo, Turilli, Angius, Alessio, Oral, H Sarp, De, K, Klimentov, A, Wells, Jack C., & Jha, S. High-Throughput Computing on High-Performance Platforms: A Case Study. United States. doi:10.1109/eScience.2017.43.
Oleynik, D, Panitkin, S, Matteo, Turilli, Angius, Alessio, Oral, H Sarp, De, K, Klimentov, A, Wells, Jack C., and Jha, S. Sun . "High-Throughput Computing on High-Performance Platforms: A Case Study". United States. doi:10.1109/eScience.2017.43. https://www.osti.gov/servlets/purl/1410207.
@article{osti_1410207,
title = {High-Throughput Computing on High-Performance Platforms: A Case Study},
author = {Oleynik, D and Panitkin, S and Matteo, Turilli and Angius, Alessio and Oral, H Sarp and De, K and Klimentov, A and Wells, Jack C. and Jha, S},
abstractNote = {The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan -- a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.},
doi = {10.1109/eScience.2017.43},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Oct 01 00:00:00 EDT 2017},
month = {Sun Oct 01 00:00:00 EDT 2017}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: