skip to main content

DOE PAGESDOE PAGES

Title: Python and HPC for High Energy Physics Data Analyses

High level abstractions in Python that can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in Python which are useful and efficient for end user analysis in High Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. This fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a “small” neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed withmore » high-level programming abstractions, are not designed with either scientific computing generally, or modern HPC platform features in particular, such as data caching levels, in mind. Our example HPC use case is a search for a new elementary particle which might explain the phenomenon known as “Dark Matter”. Here, using data from the CMS detector, we will use HDF5 as our input data format, and MPI with Python to implement our use case.« less
Authors:
 [1] ;  [1] ;  [1] ;  [1]
  1. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Publication Date:
Report Number(s):
FERMILAB-CONF-17-437-CD
1642376
Grant/Contract Number:
AC02-07CH11359
Type:
Accepted Manuscript
Research Org:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org:
USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS
OSTI Identifier:
1413085

Sehrish, S., Kowalkowski, J., Paterno, M., and Green, C.. Python and HPC for High Energy Physics Data Analyses. United States: N. p., Web. doi:10.1145/3149869.3149877.
Sehrish, S., Kowalkowski, J., Paterno, M., & Green, C.. Python and HPC for High Energy Physics Data Analyses. United States. doi:10.1145/3149869.3149877.
Sehrish, S., Kowalkowski, J., Paterno, M., and Green, C.. 2017. "Python and HPC for High Energy Physics Data Analyses". United States. doi:10.1145/3149869.3149877. https://www.osti.gov/servlets/purl/1413085.
@article{osti_1413085,
title = {Python and HPC for High Energy Physics Data Analyses},
author = {Sehrish, S. and Kowalkowski, J. and Paterno, M. and Green, C.},
abstractNote = {High level abstractions in Python that can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in Python which are useful and efficient for end user analysis in High Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. This fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a “small” neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed with high-level programming abstractions, are not designed with either scientific computing generally, or modern HPC platform features in particular, such as data caching levels, in mind. Our example HPC use case is a search for a new elementary particle which might explain the phenomenon known as “Dark Matter”. Here, using data from the CMS detector, we will use HDF5 as our input data format, and MPI with Python to implement our use case.},
doi = {10.1145/3149869.3149877},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {1}
}