Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Data-parallel Python for High Energy Physics Analyses

Conference ·
OSTI ID:1490837
In this paper, we explore features available in Python which are useful for data reduction tasks in High Energy Physics (HEP). Highlevel abstractions in Python are convenient for implementing data reduction tasks. However, in order for such abstractions to be practical, the efficiency of their performance must also be high. Because the data sets we process are typically large, we care about both I/O performance and in-memory processing speed. In particular, we evaluate the use of data-parallel programming, using MPI and numpy, to process a large experimental data set (42 TiB) stored in an HDF5 file. We measure the speed of processing of the data, distinguishing between the time spent reading data and the time spent processing the data in memory, and demonstrate the scalability of both, using up to 1200 KNL nodes (76800 cores) on Cori at NERSC
Research Organization:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Organization:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
DOE Contract Number:
AC02-07CH11359
OSTI ID:
1490837
Report Number(s):
FERMILAB-CONF-18-577-CD; 1712348
Country of Publication:
United States
Language:
English

Similar Records

Python and HPC for High Energy Physics Data Analyses
Journal Article · Sat Dec 31 19:00:00 EST 2016 · OSTI ID:1413085

Exploring the Performance of Spark for a Scientific Use Case
Conference · Thu Dec 31 23:00:00 EST 2015 · OSTI ID:1250827

Preparing NERSC users for Cori, a Cray XC40 system with Intel many integrated cores
Journal Article · Thu Aug 24 20:00:00 EDT 2017 · Concurrency and Computation. Practice and Experience · OSTI ID:1459400

Related Subjects