skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Analyzing How We Do Analysis and Consume Data, Results from the SciDAC-Data Project

Conference ·
OSTI ID:1394826

One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption

Research Organization:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES)
DOE Contract Number:
AC02-06CH11357
OSTI ID:
1394826
Resource Relation:
Conference: 2016 CHEP Conference, San Francisco, CA (United States), 10-14 Oct 2016
Country of Publication:
United States
Language:
English

Similar Records

SciDAC-Data, A Project to Enabling Data Driven Modeling of Exascale Computing
Conference · Mon Oct 10 00:00:00 EDT 2016 · OSTI ID:1394826

Scidac-Data: Enabling Data Driven Modeling of Exascale Computing
Journal Article · Thu Nov 23 00:00:00 EST 2017 · Journal of Physics. Conference Series · OSTI ID:1394826

Distributed data access in the sequential access model at the D0 experiment at Fermilab
Conference · Wed Jul 05 00:00:00 EDT 2000 · OSTI ID:1394826

Related Subjects