DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider

Abstract

We describe the outcome of a data challenge conducted as part of the Dark Machines (https://www.darkmachines.org) initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims to detect signals of new physics at the Large Hadron Collider (LHC) using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 billion simulated LHC events corresponding to 10\, fb^{-1} 10 f b 1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.

Authors:
 [1];  [2];  [3];  [4];  [5];  [3];  [6];  [7];  [8];  [9];  [10];  [5];  [1];  [11];  [12];  [13];  [7];  [14];  [15];  [16] more »;  [17];  [5];  [5];  [18];  [19];  [1];  [11];  [15];  [20];  [21];  [11];  [22];  [23];  [24];  [14];  [7];  [7];  [25];  [5] « less
  1. European Organization for Nuclear Research
  2. Rudolf Peierls Centre for Theoretical Physics, University of Oxford
  3. Queen Mary University of London
  4. The Ohio State University
  5. National Institute for Subatomic Physics
  6. International School for Advanced Studies, National Institute for Nuclear Physics
  7. Lund University
  8. University of California, San Diego
  9. The University of Texas at Arlington
  10. Google
  11. University of Glasgow
  12. European Organization for Nuclear Research, Worcester Polytechnic Institute
  13. Konkuk University
  14. University of Adelaide
  15. Institute for Corpuscular Physics
  16. Rice University
  17. RWTH Aachen University
  18. California Institute of Technology, Fermi National Accelerator Laboratory
  19. Harvard University, The NSF AI Institute for Artificial Intelligence and Fundamental Interactions
  20. Kyungpook National University
  21. European Organization for Nuclear Research, National and Kapodistrian University of Athens
  22. University of Houston
  23. California Institute of Technology
  24. University College London
  25. University of Vienna, European Organization for Nuclear Research
Publication Date:
Research Org.:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), High Energy Physics (HEP); European Research Council (ERC); Australian Research Council (ARC); Science and Technology Facilities Council (STFC); National Research Foundation of Korea (NRF); National Science Foundation (NSF)
OSTI Identifier:
1842562
Alternate Identifier(s):
OSTI ID: 1824176
Report Number(s):
FERMILAB-PUB-21-285-CMS; arXiv:2105.14027
Journal ID: ISSN 2542-4653; 043
Grant/Contract Number:  
AC02-07CH11359; SC0011726; SC0011925; SC0013607; SC0019227; SC0021187; SC0021396; ST/T000864/1; 2019R1A2C1009419; URF\R1\191524; PHY-2019786; 772369; DP180102209; CE200100008; 788223; ST/P000274/1
Resource Type:
Published Article
Journal Name:
SciPost Physics
Additional Journal Information:
Journal Name: SciPost Physics Journal Volume: 12 Journal Issue: 1; Journal ID: ISSN 2542-4653
Publisher:
Stichting SciPost
Country of Publication:
Netherlands
Language:
English
Subject:
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS

Citation Formats

Aarrestad, Thea, van Beekveld, Melissa, Bona, Marcella, Boveia, Antonio, Caron, Sascha, Davies, Joe, de Simone, Andrea, Doglioni, Caterina, Duarte, Javier, Farbin, Amir, Gupta, Honey, Hendriks, Luc, Heinrich, Lukas A., Howarth, James, Jawahar, Pratik, Jueid, Adil, Lastow, Jessica, Leinweber, Adam, Mamuzic, Judita, Merényi, Erzsébet, Morandini, Alessandro, Moskvitina, Polina, Nellist, Clara, Ngadiuba, Jennifer, Ostdiek, Bryan, Pierini, Maurizio, Ravina, Baptiste, Ruiz de Austri, Roberto, Sekmen, Sezen, Touranakou, Mary, Vaškeviciute, Marija, Vilalta, Ricardo, Vlimant, Jean-Roch, Verheyen, Rob, White, Martin, Wulff, Eric, Wallin, Erik, Wozniak, Kinga A., and Zhang, Zhongyi. The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider. Netherlands: N. p., 2022. Web. doi:10.21468/SciPostPhys.12.1.043.
Aarrestad, Thea, van Beekveld, Melissa, Bona, Marcella, Boveia, Antonio, Caron, Sascha, Davies, Joe, de Simone, Andrea, Doglioni, Caterina, Duarte, Javier, Farbin, Amir, Gupta, Honey, Hendriks, Luc, Heinrich, Lukas A., Howarth, James, Jawahar, Pratik, Jueid, Adil, Lastow, Jessica, Leinweber, Adam, Mamuzic, Judita, Merényi, Erzsébet, Morandini, Alessandro, Moskvitina, Polina, Nellist, Clara, Ngadiuba, Jennifer, Ostdiek, Bryan, Pierini, Maurizio, Ravina, Baptiste, Ruiz de Austri, Roberto, Sekmen, Sezen, Touranakou, Mary, Vaškeviciute, Marija, Vilalta, Ricardo, Vlimant, Jean-Roch, Verheyen, Rob, White, Martin, Wulff, Eric, Wallin, Erik, Wozniak, Kinga A., & Zhang, Zhongyi. The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider. Netherlands. https://doi.org/10.21468/SciPostPhys.12.1.043
Aarrestad, Thea, van Beekveld, Melissa, Bona, Marcella, Boveia, Antonio, Caron, Sascha, Davies, Joe, de Simone, Andrea, Doglioni, Caterina, Duarte, Javier, Farbin, Amir, Gupta, Honey, Hendriks, Luc, Heinrich, Lukas A., Howarth, James, Jawahar, Pratik, Jueid, Adil, Lastow, Jessica, Leinweber, Adam, Mamuzic, Judita, Merényi, Erzsébet, Morandini, Alessandro, Moskvitina, Polina, Nellist, Clara, Ngadiuba, Jennifer, Ostdiek, Bryan, Pierini, Maurizio, Ravina, Baptiste, Ruiz de Austri, Roberto, Sekmen, Sezen, Touranakou, Mary, Vaškeviciute, Marija, Vilalta, Ricardo, Vlimant, Jean-Roch, Verheyen, Rob, White, Martin, Wulff, Eric, Wallin, Erik, Wozniak, Kinga A., and Zhang, Zhongyi. Fri . "The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider". Netherlands. https://doi.org/10.21468/SciPostPhys.12.1.043.
@article{osti_1842562,
title = {The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider},
author = {Aarrestad, Thea and van Beekveld, Melissa and Bona, Marcella and Boveia, Antonio and Caron, Sascha and Davies, Joe and de Simone, Andrea and Doglioni, Caterina and Duarte, Javier and Farbin, Amir and Gupta, Honey and Hendriks, Luc and Heinrich, Lukas A. and Howarth, James and Jawahar, Pratik and Jueid, Adil and Lastow, Jessica and Leinweber, Adam and Mamuzic, Judita and Merényi, Erzsébet and Morandini, Alessandro and Moskvitina, Polina and Nellist, Clara and Ngadiuba, Jennifer and Ostdiek, Bryan and Pierini, Maurizio and Ravina, Baptiste and Ruiz de Austri, Roberto and Sekmen, Sezen and Touranakou, Mary and Vaškeviciute, Marija and Vilalta, Ricardo and Vlimant, Jean-Roch and Verheyen, Rob and White, Martin and Wulff, Eric and Wallin, Erik and Wozniak, Kinga A. and Zhang, Zhongyi},
abstractNote = {We describe the outcome of a data challenge conducted as part of the Dark Machines (https://www.darkmachines.org) initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims to detect signals of new physics at the Large Hadron Collider (LHC) using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 billion simulated LHC events corresponding to 10\, fb^{-1} 10 f b − 1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.},
doi = {10.21468/SciPostPhys.12.1.043},
journal = {SciPost Physics},
number = 1,
volume = 12,
place = {Netherlands},
year = {Fri Jan 28 00:00:00 EST 2022},
month = {Fri Jan 28 00:00:00 EST 2022}
}

Works referenced in this record:

New- Z phenomenology
journal, January 1991


QCD or what?
journal, January 2019


Search for high mass dijet resonances with a new background prediction method in proton-proton collisions at $$ \sqrt{s} $$ = 13 TeV
journal, May 2020

  • Sirunyan, A. M.; Tumasyan, A.; Adam, W.
  • Journal of High Energy Physics, Vol. 2020, Issue 5
  • DOI: 10.1007/JHEP05(2020)033

Quasi-model-independent search for new physics at large transverse momentum
journal, June 2001


LHAPDF6: parton density access in the LHC precision era
journal, March 2015


Nonlinear principal component analysis using autoassociative neural networks
journal, February 1991


Energy flow networks: deep sets for particle jets
journal, January 2019

  • Komiske, Patrick T.; Metodiev, Eric M.; Thaler, Jesse
  • Journal of High Energy Physics, Vol. 2019, Issue 1
  • DOI: 10.1007/JHEP01(2019)121

Self-organized formation of topologically correct feature maps
journal, January 1982


Learning new physics from a machine
journal, January 2019


Adversarially-trained autoencoders for robust unsupervised new physics searches
journal, October 2019

  • Blance, Andrew; Spannowsky, Michael; Waite, Philip
  • Journal of High Energy Physics, Vol. 2019, Issue 10
  • DOI: 10.1007/JHEP10(2019)047

The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations
journal, July 2014

  • Alwall, J.; Frederix, R.; Frixione, S.
  • Journal of High Energy Physics, Vol. 2014, Issue 7
  • DOI: 10.1007/JHEP07(2014)079

FastJet user manual: (for version 3.0.2)
journal, March 2012


A comparison of optimisation algorithms for high-dimensional particle and astrophysics applications
journal, May 2021

  • Balázs, Csaba; van Beekveld, Melissa; Caron, Sascha
  • Journal of High Energy Physics, Vol. 2021, Issue 5
  • DOI: 10.1007/JHEP05(2021)108

Long-lived heavy neutrinos from Higgs decays
journal, August 2018

  • Deppisch, Frank F.; Liu, Wei; Mitra, Manimala
  • Journal of High Energy Physics, Vol. 2018, Issue 8
  • DOI: 10.1007/JHEP08(2018)181

A general search for new phenomena at HERA
journal, April 2009


Mixture Models: Inference and Applications to Clustering.
journal, March 1989

  • Lindsay, Bruce; McLachlan, G. L.; Basford, K. E.
  • Journal of the American Statistical Association, Vol. 84, Issue 405
  • DOI: 10.2307/2289892

Anomaly detection with density estimation
journal, April 2020


Density‐based clustering
journal, April 2011

  • Kriegel, Hans‐Peter; Kröger, Peer; Sander, Jörg
  • WIREs Data Mining and Knowledge Discovery, Vol. 1, Issue 3
  • DOI: 10.1002/widm.30

The search for supersymmetry: Probing physics beyond the standard model
journal, January 1985


New Physics Agnostic Selections For New Physics Searches
journal, January 2020


Nonparametric density estimation for high‐dimensional data—Algorithms and applications
journal, April 2019

  • Wang, Zhipeng; Scott, David W.
  • WIREs Computational Statistics, Vol. 11, Issue 4
  • DOI: 10.1002/wics.1461

Parton distributions from high-precision collider data: NNPDF Collaboration
journal, October 2017


Recursive jigsaw reconstruction: HEP event analysis in the presence of kinematic and combinatoric ambiguities
journal, December 2017


Combining outlier analysis algorithms to identify new physics at the LHC
journal, September 2021

  • van Beekveld, Melissa; Caron, Sascha; Hendriks, Luc
  • Journal of High Energy Physics, Vol. 2021, Issue 9
  • DOI: 10.1007/JHEP09(2021)024

Extending the search for new resonances with machine learning
journal, January 2019


Reducing the Dimensionality of Data with Neural Networks
journal, July 2006


Phenomenology of the minimal B L extension of the standard model: Z and neutrinos
journal, September 2009


LHCsimulationProject
dataset, January 2020


Identification of point sources in gamma rays using U-shaped convolutional neural networks and a data challenge
journal, November 2021


Novelty detection meets collider physics
journal, April 2020


Adversarially Learned Anomaly Detection on CMS open data: re-discovering the top quark
journal, February 2021


Complete set of Feynman rules for the minimal supersymmetric extension of the standard model
journal, June 1990


Unsupervised-Hackathon
dataset, January 2020


Topological obstructions to autoencoding
journal, April 2021

  • Batson, Joshua; Haaf, C. Grace; Kahn, Yonatan
  • Journal of High Energy Physics, Vol. 2021, Issue 4
  • DOI: 10.1007/JHEP04(2021)280

The Elements of Statistical Learning
book, January 2009


Model-independent and quasi-model-independent search for new physics at CDF
journal, July 2008


Review of Particle Physics
journal, August 2020

  • Zyla, P. A.; Barnett, R. M.; Beringer, J.
  • Progress of Theoretical and Experimental Physics, Vol. 2020, Issue 8
  • DOI: 10.1093/ptep/ptaa104

Evidence for Jet Structure in Hadron Production by e + e Annihilation
journal, December 1975


Phase space sampling and inference from weighted events with autoregressive flows
journal, January 2021


(Machine) learning to do more with less
journal, February 2018

  • Cohen, Timothy; Freytsis, Marat; Ostdiek, Bryan
  • Journal of High Energy Physics, Vol. 2018, Issue 2
  • DOI: 10.1007/JHEP02(2018)034

Classification without labels: learning from mixed samples in high energy physics
journal, October 2017

  • Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse
  • Journal of High Energy Physics, Vol. 2017, Issue 10
  • DOI: 10.1007/JHEP10(2017)174

DarkMachines secret dataset
dataset, January 2021


Variational autoencoders for new physics mining at the Large Hadron Collider
journal, May 2019

  • Cerri, Olmo; Nguyen, Thong Q.; Pierini, Maurizio
  • Journal of High Energy Physics, Vol. 2019, Issue 5
  • DOI: 10.1007/JHEP05(2019)036

DELPHES 3: a modular framework for fast simulation of a generic collider experiment
journal, February 2014

  • de Favereau, J.; Delaere, C.; Demin, P.
  • Journal of High Energy Physics, Vol. 2014, Issue 2
  • DOI: 10.1007/JHEP02(2014)057

R-Parity-violating supersymmetry
journal, November 2005


Weakly supervised classification in high energy physics
journal, May 2017

  • Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco
  • Journal of High Energy Physics, Vol. 2017, Issue 5
  • DOI: 10.1007/JHEP05(2017)145

Anomaly Detection for Resonant New Physics with Machine Learning
journal, December 2018


Isolation Forest
conference, December 2008

  • Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua
  • 2008 Eighth IEEE International Conference on Data Mining (ICDM)
  • DOI: 10.1109/ICDM.2008.17

Beyond the Minimal Supersymmetric Standard Model: from Theory to Phenomenology
journal, March 2012


Density Estimation for Statistics and Data Analysis
book, January 1998


Extracting and composing robust features with denoising autoencoders
conference, January 2008

  • Vincent, Pascal; Larochelle, Hugo; Bengio, Yoshua
  • Proceedings of the 25th international conference on Machine learning - ICML '08
  • DOI: 10.1145/1390156.1390294

Trial factors for the look elsewhere effect in high energy physics
journal, October 2010


Better latent spaces for better autoencoders
journal, January 2021


On measuring the masses of pair-produced semi-invisibly decaying particles at hadron colliders
journal, April 2008


MUSiC: a model-unspecific search for new physics in proton–proton collisions at $$\sqrt{s} = 13\,\text {TeV} $$
journal, July 2021


Guiding new physics searches with unsupervised learning
journal, March 2019


ALPGEN, a generator for hard multiparton processes in hadronic collisions
journal, July 2003

  • Mangano, Michelangelo L.; Piccinini, Fulvio; Polosa, Antonio D.
  • Journal of High Energy Physics, Vol. 2003, Issue 07
  • DOI: 10.1088/1126-6708/2003/07/001

Searching for new physics with deep autoencoders
journal, April 2020


Global search for new physics with 2.0 fb 1 at CDF
journal, January 2009


The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics
journal, December 2021

  • Kasieczka, Gregor; Nachman, Benjamin; Shih, David
  • Reports on Progress in Physics, Vol. 84, Issue 12
  • DOI: 10.1088/1361-6633/ac36b9

An introduction to PYTHIA 8.2
journal, June 2015

  • Sjöstrand, Torbjörn; Ask, Stefan; Christiansen, Jesper R.
  • Computer Physics Communications, Vol. 191
  • DOI: 10.1016/j.cpc.2015.01.024

A general search for new phenomena in ep scattering at HERA
journal, November 2004


Neural networks and principal component analysis: Learning from examples without local minima
journal, January 1989


Simplest Z model
journal, October 1991


Asymptotic formulae for likelihood-based tests of new physics
journal, February 2011


Quasi-Model-Independent Search for New High p T Physics at D0
journal, April 2001


Supersymmetry, supergravity and particle physics
journal, August 1984


LOF: identifying density-based local outliers
journal, June 2000

  • Breunig, Markus M.; Kriegel, Hans-Peter; Ng, Raymond T.
  • ACM SIGMOD Record, Vol. 29, Issue 2
  • DOI: 10.1145/335191.335388

Clustering high dimensional data
journal, June 2012

  • Assent, Ira
  • WIREs Data Mining and Knowledge Discovery, Vol. 2, Issue 4
  • DOI: 10.1002/widm.1062

A strategy for a general search for new phenomena using data-derived signal regions and its application within the ATLAS experiment
journal, February 2019


Dijet Resonance Search with Weak Supervision Using s = 13 TeV p p Collisions in the ATLAS Detector
journal, September 2020


Search for new physics in e μ X data at DØ using SLEUTH: A quasi-model-independent search strategy for new physics
journal, October 2000


Representation Learning: A Review and New Perspectives
journal, August 2013

  • Bengio, Y.; Courville, A.; Vincent, P.
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, Issue 8
  • DOI: 10.1109/TPAMI.2013.50