skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scidac-Data: Enabling Data Driven Modeling of Exascale Computing

Abstract

Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughputmore » computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.« less

Authors:
 [1];  [2];  [2];  [2];  [2];  [2];  [1]
  1. Argonne National Lab. (ANL), Lemont, IL (United States)
  2. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1437988
Report Number(s):
FERMILAB-CONF-16-769-CD
Journal ID: ISSN 1742-6588; 1638174; TRN: US1900375
Grant/Contract Number:  
AC02-07CH11359
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Physics. Conference Series
Additional Journal Information:
Journal Volume: 898; Journal Issue: 6; Journal ID: ISSN 1742-6588
Publisher:
IOP Publishing
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Mubarak, Misbah, Ding, Pengfei, Aliaga, Leo, Tsaris, Aristeidis, Norman, Andrew, Lyon, Adam, and Ross, Robert. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing. United States: N. p., 2017. Web. doi:10.1088/1742-6596/898/6/062048.
Mubarak, Misbah, Ding, Pengfei, Aliaga, Leo, Tsaris, Aristeidis, Norman, Andrew, Lyon, Adam, & Ross, Robert. Scidac-Data: Enabling Data Driven Modeling of Exascale Computing. United States. doi:10.1088/1742-6596/898/6/062048.
Mubarak, Misbah, Ding, Pengfei, Aliaga, Leo, Tsaris, Aristeidis, Norman, Andrew, Lyon, Adam, and Ross, Robert. Thu . "Scidac-Data: Enabling Data Driven Modeling of Exascale Computing". United States. doi:10.1088/1742-6596/898/6/062048. https://www.osti.gov/servlets/purl/1437988.
@article{osti_1437988,
title = {Scidac-Data: Enabling Data Driven Modeling of Exascale Computing},
author = {Mubarak, Misbah and Ding, Pengfei and Aliaga, Leo and Tsaris, Aristeidis and Norman, Andrew and Lyon, Adam and Ross, Robert},
abstractNote = {Here, the SciDAC-Data project is a DOE-funded initiative to analyze and exploit two decades of information and analytics that have been collected by the Fermilab data center on the organization, movement, and consumption of high energy physics (HEP) data. The project analyzes the analysis patterns and data organization that have been used by NOvA, MicroBooNE, MINERvA, CDF, D0, and other experiments to develop realistic models of HEP analysis workflows and data processing. The SciDAC-Data project aims to provide both realistic input vectors and corresponding output data that can be used to optimize and validate simulations of HEP analysis. These simulations are designed to address questions of data handling, cache optimization, and workflow structures that are the prerequisites for modern HEP analysis chains to be mapped and optimized to run on the next generation of leadership-class exascale computing facilities. We present the use of a subset of the SciDAC-Data distributions, acquired from analysis of approximately 71,000 HEP workflows run on the Fermilab data center and corresponding to over 9 million individual analysis jobs, as the input to detailed queuing simulations that model the expected data consumption and caching behaviors of the work running in high performance computing (HPC) and high throughput computing (HTC) environments. In particular we describe how the Sequential Access via Metadata (SAM) data-handling system in combination with the dCache/Enstore-based data archive facilities has been used to develop radically different models for analyzing the HEP data. We also show how the simulations may be used to assess the impact of design choices in archive facilities.},
doi = {10.1088/1742-6596/898/6/062048},
journal = {Journal of Physics. Conference Series},
number = 6,
volume = 898,
place = {United States},
year = {2017},
month = {11}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Figures / Tables:

Figure 1 Figure 1: Schematic layout of components of the Fermilab archive facility that are modeled in the simulation. Components of the data storage and management system that are beyond the scope of the simulation are not shown.

Save / Share:
Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.