skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Python and HPC for High Energy Physics Data Analyses

Abstract

High level abstractions in Python that can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in Python which are useful and efficient for end user analysis in High Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. This fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a “small” neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed withmore » high-level programming abstractions, are not designed with either scientific computing generally, or modern HPC platform features in particular, such as data caching levels, in mind. Our example HPC use case is a search for a new elementary particle which might explain the phenomenon known as “Dark Matter”. Here, using data from the CMS detector, we will use HDF5 as our input data format, and MPI with Python to implement our use case.« less

Authors:
 [1];  [1];  [1];  [1]
  1. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1413085
Report Number(s):
FERMILAB-CONF-17-437-CD
1642376
Grant/Contract Number:  
AC02-07CH11359
Resource Type:
Journal Article: Accepted Manuscript
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS

Citation Formats

Sehrish, S., Kowalkowski, J., Paterno, M., and Green, C. Python and HPC for High Energy Physics Data Analyses. United States: N. p., 2017. Web. doi:10.1145/3149869.3149877.
Sehrish, S., Kowalkowski, J., Paterno, M., & Green, C. Python and HPC for High Energy Physics Data Analyses. United States. https://doi.org/10.1145/3149869.3149877
Sehrish, S., Kowalkowski, J., Paterno, M., and Green, C. Sun . "Python and HPC for High Energy Physics Data Analyses". United States. https://doi.org/10.1145/3149869.3149877. https://www.osti.gov/servlets/purl/1413085.
@article{osti_1413085,
title = {Python and HPC for High Energy Physics Data Analyses},
author = {Sehrish, S. and Kowalkowski, J. and Paterno, M. and Green, C.},
abstractNote = {High level abstractions in Python that can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in Python which are useful and efficient for end user analysis in High Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. This fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a “small” neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed with high-level programming abstractions, are not designed with either scientific computing generally, or modern HPC platform features in particular, such as data caching levels, in mind. Our example HPC use case is a search for a new elementary particle which might explain the phenomenon known as “Dark Matter”. Here, using data from the CMS detector, we will use HDF5 as our input data format, and MPI with Python to implement our use case.},
doi = {10.1145/3149869.3149877},
url = {https://www.osti.gov/biblio/1413085}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:

Works referenced in this record:

Spark and HPC for High Energy Physics Data Analyses
conference, May 2017


Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC
journal, September 2012


ROOT — An object oriented data analysis framework
journal, April 1997


    Works referencing / citing this record:

    Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets
    conference, January 2019