DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Novel functional and distributed approaches to data analysis available in ROOT

Abstract

The bright future of particle physics at the Energy and Intensity frontiers poses exciting challenges to the scientific software community. The traditional strategies for processing and analyzing data are evolving in order to (i) offer higher-level programming model approaches and (ii) exploit parallelism to cope with the ever increasing complexity and size of the datasets. This contribution describes how the ROOT framework, a cornerstone of software stacks dedicated to particle physics, is preparing to provide adequate solutions for the analysis of large amount of scientific data on parallel architectures. The functional approach to parallel data analysis provided with the ROOT TDataFrame interface is then characterized. The design choices behind this new interface are described also comparing with other widely adopted tools such as Pandas and Apache Spark. The programming model is illustrated highlighting the reduction of boilerplate code, composability of the actions and data transformations as well as the capabilities of dealing with different data sources such as ROOT, JSON, CSV or databases. Details are given about how the functional approach allows transparent implicit parallelization of the chain of operations specified by the user. The progress done in the field of distributed analysis is examined. In particular, the power ofmore » the integration of ROOT with Apache Spark via the PyROOT interface is shown. In addition, the building blocks for the expression of parallelism in ROOT are briefly characterized together with the structural changes applied in the building and testing infrastructure which were necessary to put them in production.« less

Authors:
 [1];  [1]; ORCiD logo [2];  [1];  [3];  [1];  [1];  [1];  [1];  [4]
  1. European Organization for Nuclear Research (CERN), Geneva (Switzerland)
  2. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  3. European Organization for Nuclear Research (CERN), Geneva (Switzerland); Univ. of Oldenburg, Oldenburg (Germany)
  4. European Organization for Nuclear Research (CERN), Geneva (Switzerland); Univ. Jaume I, Castellon (Spain)
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP)
OSTI Identifier:
1523435
Report Number(s):
FERMILAB-CONF-18-750-CD
Journal ID: ISSN 1742-6588; 1699879
Grant/Contract Number:  
AC02-07CH11359
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Physics. Conference Series
Additional Journal Information:
Journal Volume: 1085; Journal Issue: 4; Conference: 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Seattle, WA (United States), 21-25 Aug 2017; Journal ID: ISSN 1742-6588
Publisher:
IOP Publishing
Country of Publication:
United States
Language:
English
Subject:
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS

Citation Formats

Amadio, G., Blomer, J., Canal, P., Ganis, G., Guiraud, E., Mato Vila, P., Moneta, L., Piparo, D., Tejedor, E., and Valls Pla, X. Novel functional and distributed approaches to data analysis available in ROOT. United States: N. p., 2018. Web. doi:10.1088/1742-6596/1085/4/042008.
Amadio, G., Blomer, J., Canal, P., Ganis, G., Guiraud, E., Mato Vila, P., Moneta, L., Piparo, D., Tejedor, E., & Valls Pla, X. Novel functional and distributed approaches to data analysis available in ROOT. United States. https://doi.org/10.1088/1742-6596/1085/4/042008
Amadio, G., Blomer, J., Canal, P., Ganis, G., Guiraud, E., Mato Vila, P., Moneta, L., Piparo, D., Tejedor, E., and Valls Pla, X. Thu . "Novel functional and distributed approaches to data analysis available in ROOT". United States. https://doi.org/10.1088/1742-6596/1085/4/042008. https://www.osti.gov/servlets/purl/1523435.
@article{osti_1523435,
title = {Novel functional and distributed approaches to data analysis available in ROOT},
author = {Amadio, G. and Blomer, J. and Canal, P. and Ganis, G. and Guiraud, E. and Mato Vila, P. and Moneta, L. and Piparo, D. and Tejedor, E. and Valls Pla, X.},
abstractNote = {The bright future of particle physics at the Energy and Intensity frontiers poses exciting challenges to the scientific software community. The traditional strategies for processing and analyzing data are evolving in order to (i) offer higher-level programming model approaches and (ii) exploit parallelism to cope with the ever increasing complexity and size of the datasets. This contribution describes how the ROOT framework, a cornerstone of software stacks dedicated to particle physics, is preparing to provide adequate solutions for the analysis of large amount of scientific data on parallel architectures. The functional approach to parallel data analysis provided with the ROOT TDataFrame interface is then characterized. The design choices behind this new interface are described also comparing with other widely adopted tools such as Pandas and Apache Spark. The programming model is illustrated highlighting the reduction of boilerplate code, composability of the actions and data transformations as well as the capabilities of dealing with different data sources such as ROOT, JSON, CSV or databases. Details are given about how the functional approach allows transparent implicit parallelization of the chain of operations specified by the user. The progress done in the field of distributed analysis is examined. In particular, the power of the integration of ROOT with Apache Spark via the PyROOT interface is shown. In addition, the building blocks for the expression of parallelism in ROOT are briefly characterized together with the structural changes applied in the building and testing infrastructure which were necessary to put them in production.},
doi = {10.1088/1742-6596/1085/4/042008},
journal = {Journal of Physics. Conference Series},
number = 4,
volume = 1085,
place = {United States},
year = {Thu Oct 18 00:00:00 EDT 2018},
month = {Thu Oct 18 00:00:00 EDT 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: Complex control flows, which for example incarnates a series of cuts on collision events, can be concisely and expressively represented with TDataFrame.

Save / Share:

Works referenced in this record:

SWAN: A service for interactive analysis in the cloud
journal, January 2018


Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.