Python and HPC for High Energy Physics Data Analyses
- Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
High level abstractions in Python that can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in Python which are useful and efficient for end user analysis in High Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. This fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a “small” neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed with high-level programming abstractions, are not designed with either scientific computing generally, or modern HPC platform features in particular, such as data caching levels, in mind. Our example HPC use case is a search for a new elementary particle which might explain the phenomenon known as “Dark Matter”. Here, using data from the CMS detector, we will use HDF5 as our input data format, and MPI with Python to implement our use case.
- Research Organization:
- Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), High Energy Physics (HEP)
- Grant/Contract Number:
- AC02-07CH11359
- OSTI ID:
- 1413085
- Report Number(s):
- FERMILAB-CONF-17-437-CD; 1642376
- Country of Publication:
- United States
- Language:
- English
Spark and HPC for High Energy Physics Data Analyses
|
conference | May 2017 |
Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC
|
journal | September 2012 |
ROOT — An object oriented data analysis framework
|
journal | April 1997 |
Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets
|
conference | January 2019 |
Waveform Signal Entropy and Compression Study of Whole-Building Energy Datasets | preprint | January 2018 |
Similar Records
Data-parallel Python for High Energy Physics Analyses
Parallel Event Selection on HPC Systems