skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Data-parallel Python for High Energy Physics Analyses

Abstract

In this paper, we explore features available in Python which are useful for data reduction tasks in High Energy Physics (HEP). Highlevel abstractions in Python are convenient for implementing data reduction tasks. However, in order for such abstractions to be practical, the efficiency of their performance must also be high. Because the data sets we process are typically large, we care about both I/O performance and in-memory processing speed. In particular, we evaluate the use of data-parallel programming, using MPI and numpy, to process a large experimental data set (42 TiB) stored in an HDF5 file. We measure the speed of processing of the data, distinguishing between the time spent reading data and the time spent processing the data in memory, and demonstrate the scalability of both, using up to 1200 KNL nodes (76800 cores) on Cori at NERSC

Authors:
ORCiD logo [1];  [1];  [1];  [1]
  1. Fermilab
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1490837
Report Number(s):
FERMILAB-CONF-18-577-CD
1712348
DOE Contract Number:  
AC02-07CH11359
Resource Type:
Conference
Country of Publication:
United States
Language:
English

Citation Formats

Paterno, Marc, Green, C., Kowalski, J., and Sehrish, S. Data-parallel Python for High Energy Physics Analyses. United States: N. p., 2018. Web.
Paterno, Marc, Green, C., Kowalski, J., & Sehrish, S. Data-parallel Python for High Energy Physics Analyses. United States.
Paterno, Marc, Green, C., Kowalski, J., and Sehrish, S. Fri . "Data-parallel Python for High Energy Physics Analyses". United States. https://www.osti.gov/servlets/purl/1490837.
@article{osti_1490837,
title = {Data-parallel Python for High Energy Physics Analyses},
author = {Paterno, Marc and Green, C. and Kowalski, J. and Sehrish, S.},
abstractNote = {In this paper, we explore features available in Python which are useful for data reduction tasks in High Energy Physics (HEP). Highlevel abstractions in Python are convenient for implementing data reduction tasks. However, in order for such abstractions to be practical, the efficiency of their performance must also be high. Because the data sets we process are typically large, we care about both I/O performance and in-memory processing speed. In particular, we evaluate the use of data-parallel programming, using MPI and numpy, to process a large experimental data set (42 TiB) stored in an HDF5 file. We measure the speed of processing of the data, distinguishing between the time spent reading data and the time spent processing the data in memory, and demonstrate the scalability of both, using up to 1200 KNL nodes (76800 cores) on Cori at NERSC},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2018},
month = {10}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: