skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: COFFEA - Columnar Object Framework For Effective Analysis

Abstract

The COFFEA Framework provides a new approach to HEP analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language and commodity big data technologies such as Apache Spark and NoSQL databases. To achieve this suite of improvements across many use cases, COFFEA takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will present published results from analysis of CMS data using the COFFEA framework along with a discussion of metrics and the user experience of arriving at those results with columnar analysis.

Authors:
 [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [2];  [3];  [4];  [5];  [6]
  1. Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
  2. Vanderbilt Univ., Nashville, TN (United States)
  3. National Inst. of Nuclear Physics (INFN), Trieste (Italy)
  4. Princeton Univ., NJ (United States)
  5. Univ. of Illinois at Urbana-Champaign, IL (United States). National Center for Supercomputing Applications (NCSA)
  6. Univ. of Illinois at Urbana-Champaign, IL (United States)
Publication Date:
Research Org.:
Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP)
OSTI Identifier:
1633739
Report Number(s):
FERMILAB-SLIDES-19-714-SCD
oai:inspirehep.net:1801677
DOE Contract Number:  
AC02-07CH11359
Resource Type:
Technical Report
Country of Publication:
United States
Language:
English

Citation Formats

Smith, Nick, Gray, Lindsey, Cremonesi, Matteo, Jayatilaka, Bo, Gutsche, Oliver, Hall, Allison, Pedro, Kevin, Acosta, Maria, Melo, Andrew, Belforte, Stefano, Pivarski, Jim, Galewsky, Ben, and Neubauer, Mark. COFFEA - Columnar Object Framework For Effective Analysis. United States: N. p., 2019. Web. doi:10.5281zenodo.3598789.
Smith, Nick, Gray, Lindsey, Cremonesi, Matteo, Jayatilaka, Bo, Gutsche, Oliver, Hall, Allison, Pedro, Kevin, Acosta, Maria, Melo, Andrew, Belforte, Stefano, Pivarski, Jim, Galewsky, Ben, & Neubauer, Mark. COFFEA - Columnar Object Framework For Effective Analysis. United States. https://doi.org/10.5281zenodo.3598789
Smith, Nick, Gray, Lindsey, Cremonesi, Matteo, Jayatilaka, Bo, Gutsche, Oliver, Hall, Allison, Pedro, Kevin, Acosta, Maria, Melo, Andrew, Belforte, Stefano, Pivarski, Jim, Galewsky, Ben, and Neubauer, Mark. Mon . "COFFEA - Columnar Object Framework For Effective Analysis". United States. https://doi.org/10.5281zenodo.3598789. https://www.osti.gov/servlets/purl/1633739.
@article{osti_1633739,
title = {COFFEA - Columnar Object Framework For Effective Analysis},
author = {Smith, Nick and Gray, Lindsey and Cremonesi, Matteo and Jayatilaka, Bo and Gutsche, Oliver and Hall, Allison and Pedro, Kevin and Acosta, Maria and Melo, Andrew and Belforte, Stefano and Pivarski, Jim and Galewsky, Ben and Neubauer, Mark},
abstractNote = {The COFFEA Framework provides a new approach to HEP analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language and commodity big data technologies such as Apache Spark and NoSQL databases. To achieve this suite of improvements across many use cases, COFFEA takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will present published results from analysis of CMS data using the COFFEA framework along with a discussion of metrics and the user experience of arriving at those results with columnar analysis.},
doi = {10.5281zenodo.3598789},
url = {https://www.osti.gov/biblio/1633739}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {11}
}