Python and HPC for High Energy Physics Data Analyses
Abstract
High level abstractions in Python that can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in Python which are useful and efficient for end user analysis in High Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. This fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a “small” neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed withmore »
- Authors:
-
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Publication Date:
- Research Org.:
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
- OSTI Identifier:
- 1413085
- Report Number(s):
- FERMILAB-CONF-17-437-CD
1642376
- Grant/Contract Number:
- AC02-07CH11359
- Resource Type:
- Journal Article: Accepted Manuscript
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 71 CLASSICAL AND QUANTUM MECHANICS, GENERAL PHYSICS
Citation Formats
Sehrish, S., Kowalkowski, J., Paterno, M., and Green, C. Python and HPC for High Energy Physics Data Analyses. United States: N. p., 2017.
Web. doi:10.1145/3149869.3149877.
Sehrish, S., Kowalkowski, J., Paterno, M., & Green, C. Python and HPC for High Energy Physics Data Analyses. United States. https://doi.org/10.1145/3149869.3149877
Sehrish, S., Kowalkowski, J., Paterno, M., and Green, C. Sun .
"Python and HPC for High Energy Physics Data Analyses". United States. https://doi.org/10.1145/3149869.3149877. https://www.osti.gov/servlets/purl/1413085.
@article{osti_1413085,
title = {Python and HPC for High Energy Physics Data Analyses},
author = {Sehrish, S. and Kowalkowski, J. and Paterno, M. and Green, C.},
abstractNote = {High level abstractions in Python that can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in Python which are useful and efficient for end user analysis in High Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. This fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a “small” neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed with high-level programming abstractions, are not designed with either scientific computing generally, or modern HPC platform features in particular, such as data caching levels, in mind. Our example HPC use case is a search for a new elementary particle which might explain the phenomenon known as “Dark Matter”. Here, using data from the CMS detector, we will use HDF5 as our input data format, and MPI with Python to implement our use case.},
doi = {10.1145/3149869.3149877},
url = {https://www.osti.gov/biblio/1413085},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {1}
}
Works referenced in this record:
Spark and HPC for High Energy Physics Data Analyses
conference, May 2017
- Sehrish, Saba; Kowalkowski, Jim; Paterno, Marc
- 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC
journal, September 2012
- Chatrchyan, S.; Khachatryan, V.; Sirunyan, A. M.
- Physics Letters B, Vol. 716, Issue 1
ROOT — An object oriented data analysis framework
journal, April 1997
- Brun, Rene; Rademakers, Fons
- Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 389, Issue 1-2
Works referencing / citing this record:
Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets
conference, January 2019
- Kriechbaumer, Thomas; Jorde, Daniel; Jacobsen, Hans-Arno
- Proceedings of the Tenth ACM International Conference on Future Energy Systems - e-Energy '19