DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The Dark Machines Anomaly Score Challenge: Benchmark Data and Model Independent Event Classification for the Large Hadron Collider

Journal Article · · SciPost Physics
 [1];  [2];  [3];  [4];  [5];  [3];  [6];  [7];  [8];  [9];  [10];  [5];  [1];  [11];  [12];  [13];  [7];  [14];  [15];  [16] more »;  [17];  [5];  [5];  [18];  [19];  [1];  [11];  [15];  [20];  [21];  [11];  [22];  [23];  [24];  [14];  [7];  [7];  [25];  [5] « less
  1. European Organization for Nuclear Research
  2. Rudolf Peierls Centre for Theoretical Physics, University of Oxford
  3. Queen Mary University of London
  4. The Ohio State University
  5. National Institute for Subatomic Physics
  6. International School for Advanced Studies, National Institute for Nuclear Physics
  7. Lund University
  8. University of California, San Diego
  9. The University of Texas at Arlington
  10. Google
  11. University of Glasgow
  12. European Organization for Nuclear Research, Worcester Polytechnic Institute
  13. Konkuk University
  14. University of Adelaide
  15. Institute for Corpuscular Physics
  16. Rice University
  17. RWTH Aachen University
  18. California Institute of Technology, Fermi National Accelerator Laboratory
  19. Harvard University, The NSF AI Institute for Artificial Intelligence and Fundamental Interactions
  20. Kyungpook National University
  21. European Organization for Nuclear Research, National and Kapodistrian University of Athens
  22. University of Houston
  23. California Institute of Technology
  24. University College London
  25. University of Vienna, European Organization for Nuclear Research

We describe the outcome of a data challenge conducted as part of the Dark Machines (https://www.darkmachines.org) initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims to detect signals of new physics at the Large Hadron Collider (LHC) using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 billion simulated LHC events corresponding to 10\, fb^{-1} 10 f b 1 of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.

Research Organization:
Fermi National Accelerator Laboratory (FNAL), Batavia, IL (United States); The Ohio State University, Columbus, OH (United States)
Sponsoring Organization:
Australian Research Council (ARC); European Research Council (ERC); National Research Foundation of Korea (NRF); National Science Foundation (NSF); Science and Technology Facilities Council (STFC); USDOE; USDOE Office of Science (SC), High Energy Physics (HEP)
Grant/Contract Number:
AC02-07CH11359; SC0011726; SC0011925; SC0013607; SC0019227; SC0021187; SC0021396
OSTI ID:
1842562
Report Number(s):
FERMILAB-PUB--21-285-CMS; arXiv:2105.14027; 043
Journal Information:
SciPost Physics, Journal Name: SciPost Physics Journal Issue: 1 Vol. 12; ISSN 2542-4653
Publisher:
Stichting SciPostCopyright Statement
Country of Publication:
Netherlands
Language:
English

References (82)

Multivariate Density Estimation book August 1992
Nonlinear principal component analysis using autoassociative neural networks journal February 1991
Nonparametric density estimation for high‐dimensional data—Algorithms and applications journal April 2019
Clustering high dimensional data journal June 2012
Density‐based clustering journal April 2011
The Elements of Statistical Learning book January 2009
Principal Components in Regression Analysis book January 1986
Self-organized formation of topologically correct feature maps journal January 1982
Energy flow networks: deep sets for particle jets journal January 2019
DELPHES 3: a modular framework for fast simulation of a generic collider experiment journal February 2014
(Machine) learning to do more with less journal February 2018
Search for new resonances in mass distributions of jet pairs using 139 fb−1 of pp collisions at s$$ \sqrt{\mathrm{s}} $$ = 13 TeV with the ATLAS detector journal March 2020
Topological obstructions to autoencoding journal April 2021
Weakly supervised classification in high energy physics journal May 2017
Variational autoencoders for new physics mining at the Large Hadron Collider journal May 2019
LHC constraints on a B − L gauge model using Contur journal May 2019
Search for high mass dijet resonances with a new background prediction method in proton-proton collisions at $$ \sqrt{s} $$ = 13 TeV journal May 2020
A comparison of optimisation algorithms for high-dimensional particle and astrophysics applications journal May 2021
The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations journal July 2014
Long-lived heavy neutrinos from Higgs decays journal August 2018
Combining outlier analysis algorithms to identify new physics at the LHC journal September 2021
Classification without labels: learning from mixed samples in high energy physics journal October 2017
Adversarially-trained autoencoders for robust unsupervised new physics searches journal October 2019
Supersymmetry, supergravity and particle physics journal August 1984
The search for supersymmetry: Probing physics beyond the standard model journal January 1985
Neural networks and principal component analysis: Learning from examples without local minima journal January 1989
An introduction to PYTHIA 8.2 journal June 2015
A general search for new phenomena in ep scattering at HERA journal November 2004
A general search for new phenomena at HERA journal April 2009
Measurement of the t t ‾ b b ‾ production cross section in the all-jet final state in pp collisions at s = 13  TeV journal April 2020
R-Parity-violating supersymmetry journal November 2005
Identification of point sources in gamma rays using U-shaped convolutional neural networks and a data challenge journal November 2021
New Physics Agnostic Selections For New Physics Searches journal January 2020
ALPGEN, a generator for hard multiparton processes in hadronic collisions journal July 2003
On measuring the masses of pair-produced semi-invisibly decaying particles at hadron colliders journal April 2008
The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics journal December 2021
Review of Particle Physics journal August 2020
Searching for new physics with deep autoencoders journal April 2020
Anomaly detection with density estimation journal April 2020
Novelty detection meets collider physics journal April 2020
Complete set of Feynman rules for the minimal supersymmetric extension of the standard model journal June 1990
New- Z ′ phenomenology journal January 1991
Simplest Z ′ model journal October 1991
Search for new physics in e μ X data at DØ using SLEUTH: A quasi-model-independent search strategy for new physics journal October 2000
Quasi-model-independent search for new physics at large transverse momentum journal June 2001
Model-independent and quasi-model-independent search for new physics at CDF journal July 2008
Global search for new physics with 2.0     fb − 1 at CDF journal January 2009
Phenomenology of the minimal B − L extension of the standard model: Z ′ and neutrinos journal September 2009
Model independent search for new phenomena in p p ¯ collisions at s = 1.96     TeV journal May 2012
Recursive jigsaw reconstruction: HEP event analysis in the presence of kinematic and combinatoric ambiguities journal December 2017
Learning to classify from impure samples with high-dimensional data journal July 2018
Extending the search for new resonances with machine learning journal January 2019
Learning new physics from a machine journal January 2019
Dijet Searches for Supersymmetry at the Large Hadron Collider journal November 2008
Anomaly Detection for Resonant New Physics with Machine Learning journal December 2018
Dijet Resonance Search with Weak Supervision Using s = 13     TeV p p Collisions in the ATLAS Detector journal September 2020
Evidence for Jet Structure in Hadron Production by e + e − Annihilation journal December 1975
Quasi-Model-Independent Search for New High p T Physics at D0 journal April 2001
Isolation Forest conference December 2008
Representation Learning: A Review and New Perspectives journal August 2013
Reducing the Dimensionality of Data with Neural Networks journal July 2006
Trial factors for the look elsewhere effect in high energy physics journal October 2010
Asymptotic formulae for likelihood-based tests of new physics journal February 2011
FastJet user manual: (for version 3.0.2) journal March 2012
LHAPDF6: parton density access in the LHC precision era journal March 2015
Parton distributions from high-precision collider data: NNPDF Collaboration journal October 2017
A strategy for a general search for new phenomena using data-derived signal regions and its application within the ATLAS experiment journal February 2019
Guiding new physics searches with unsupervised learning journal March 2019
MUSiC: a model-unspecific search for new physics in proton–proton collisions at $$\sqrt{s} = 13\,\text {TeV} $$ journal July 2021
Adversarially Learned Anomaly Detection on CMS open data: re-discovering the top quark journal February 2021
Beyond the Minimal Supersymmetric Standard Model: from Theory to Phenomenology journal March 2012
Extracting and composing robust features with denoising autoencoders conference January 2008
LOF: identifying density-based local outliers journal June 2000
Density Estimation for Statistics and Data Analysis book January 1998
Phase space sampling and inference from weighted events with autoregressive flows journal January 2021
Better latent spaces for better autoencoders journal January 2021
QCD or what? journal January 2019
Mixture Models: Inference and Applications to Clustering. journal March 1989
tommyod/KDEpy: Kernel Density Estimation in Python v0.9.10 software December 2018
LHCsimulationProject dataset January 2020
Unsupervised-Hackathon dataset January 2020
DarkMachines secret dataset dataset January 2021