skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Parallel Event Selection on HPC Systems

Abstract

In their recent measurement of the neutrino oscillation parameters,NOvA uses a sample of approximately 25 million reconstructed spills to searchfor electron-neutrino appearance events. These events are stored in an n-tupleformat, in 250 thousand ROOT files. File sizes range from a few hundred KiB toa few MiB; the full dataset is approximately 1.4 TiB. These millions of eventsare reduced to a few tens of events by the application of strict event selectioncriteria, and then summarized by a handful of numbers each, which are used inthe extraction of the neutrino oscillation parameters.The NOvA event selection code is currently a serial C++ program that readsthese n-tuples. The current table data format and organization and the selection/reduction processing involved provides us with an opportunity to explorealternate approaches to represent the data and implement the processing. Werepresent our n-tuple data in HDF5 format that is optimized for the HPC environmentand which allows us to use the machine’s high-performance parallelI/O capabilities. We use MPI, numpy and h5py to implement our approach andcompare the performance with the existing approach. We study the performanceimplications of using thousands of small files of different sizes as comparedwith one large file using HPC resources. This work has been done as partmore » of theSciDAC project, “HEP analytics on HPC” in collaboration with the ASCR teamsat ANL and LBNL.« less

Authors:
 [1];  [1];  [1]
  1. Fermilab
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1581426
Report Number(s):
FERMILAB-CONF-18-667-CD
oai:inspirehep.net:1761011
DOE Contract Number:  
AC02-07CH11359
Resource Type:
Conference
Journal Name:
EPJ Web Conf.
Additional Journal Information:
Journal Volume: 214; Conference: 23rd International Conference on Computing in High Energy and Nuclear Physics, Sofia, Bulgaria, 07/09-07/13/2018
Country of Publication:
United States
Language:
English

Citation Formats

Paterno, Marc, Kowalkowski, Jim, and Sehrish, Saba. Parallel Event Selection on HPC Systems. United States: N. p., 2019. Web. doi:10.1051/epjconf/201921404059.
Paterno, Marc, Kowalkowski, Jim, & Sehrish, Saba. Parallel Event Selection on HPC Systems. United States. doi:10.1051/epjconf/201921404059.
Paterno, Marc, Kowalkowski, Jim, and Sehrish, Saba. Tue . "Parallel Event Selection on HPC Systems". United States. doi:10.1051/epjconf/201921404059. https://www.osti.gov/servlets/purl/1581426.
@article{osti_1581426,
title = {Parallel Event Selection on HPC Systems},
author = {Paterno, Marc and Kowalkowski, Jim and Sehrish, Saba},
abstractNote = {In their recent measurement of the neutrino oscillation parameters,NOvA uses a sample of approximately 25 million reconstructed spills to searchfor electron-neutrino appearance events. These events are stored in an n-tupleformat, in 250 thousand ROOT files. File sizes range from a few hundred KiB toa few MiB; the full dataset is approximately 1.4 TiB. These millions of eventsare reduced to a few tens of events by the application of strict event selectioncriteria, and then summarized by a handful of numbers each, which are used inthe extraction of the neutrino oscillation parameters.The NOvA event selection code is currently a serial C++ program that readsthese n-tuples. The current table data format and organization and the selection/reduction processing involved provides us with an opportunity to explorealternate approaches to represent the data and implement the processing. Werepresent our n-tuple data in HDF5 format that is optimized for the HPC environmentand which allows us to use the machine’s high-performance parallelI/O capabilities. We use MPI, numpy and h5py to implement our approach andcompare the performance with the existing approach. We study the performanceimplications of using thousands of small files of different sizes as comparedwith one large file using HPC resources. This work has been done as part of theSciDAC project, “HEP analytics on HPC” in collaboration with the ASCR teamsat ANL and LBNL.},
doi = {10.1051/epjconf/201921404059},
journal = {EPJ Web Conf.},
number = ,
volume = 214,
place = {United States},
year = {2019},
month = {1}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:

Works referenced in this record:

Scalable parallel building blocks for custom data analysis
conference, October 2011

  • Peterka, Tom; Ross, Robert; Gyulassy, Attila
  • 2011 IEEE Symposium on Large Data Analysis and Visualization (LDAV)
  • DOI: 10.1109/LDAV.2011.6092324

Constraints on Oscillation Parameters from ν e Appearance and ν μ Disappearance in NOvA
journal, June 2017


The art framework
journal, December 2012