skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Classification without labels: learning from mixed samples in high energy physics

Journal Article · · Journal of High Energy Physics (Online)
 [1];  [2];  [1]
  1. Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
USDOE Office of Science (SC)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
1421837
Journal Information:
Journal of High Energy Physics (Online), Vol. 2017, Issue 10; ISSN 1029-8479
Publisher:
Springer BerlinCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 125 works
Citation information provided by
Web of Science

References (48)

Weak supervision and other non-standard classification problems: A taxonomy journal January 2016
Jet Substructure as a New Higgs-Search Channel at the Large Hadron Collider journal June 2008
Jet shapes and jet algorithms in SCET journal November 2010
The anti- k t jet clustering algorithm journal April 2008
Soft drop journal May 2014
Substructure of high- p T jets at the LHC journal April 2009
A brief introduction to PYTHIA 8.1 journal June 2008
How much information is in a jet? journal June 2017
Identification of boosted, hadronically decaying W bosons and comparisons with ATLAS data taken at $$\sqrt{s} = 8$$ s = 8  TeV journal March 2016
Jet observables without jet algorithms journal April 2014
Identification of b-quark jets with the CMS experiment journal April 2013
Event shape–energy flow correlations journal July 2003
Jet-images — deep learning edition journal July 2016
Deep-learning top taggers or the end of QCD? journal May 2017
Deep learning in color: towards automated quark/gluon jet discrimination journal January 2017
Quark-gluon separation in three-jet events journal May 1981
Factorization for groomed jet substructure beyond the next-to-leading logarithm journal July 2016
FastJet user manual: (for version 3.0.2) journal March 2012
Weakly supervised classification in high energy physics journal May 2017
Jet-images: computer vision inspired techniques for jet tagging journal February 2015
Pure samples of quark and gluon jets at the LHC journal October 2011
Measurement of the charged-particle multiplicity inside jets from $$\sqrt{s}=8$$ s = 8 $${\mathrm{TeV}}$$ TeV  pp collisions with the ATLAS detector journal June 2016
Systematics of quark/gluon tagging journal July 2017
Jet shapes with the broadening axis journal April 2014
Playing tag with ANN: boosted top identification with pattern recognition journal July 2015
Classification with asymmetric label noise: Consistency and maximal denoising journal January 2016
Quark and gluon jet substructure journal April 2013
Towards an understanding of jet substructure journal September 2013
Using neural networks to identify jets journal February 1991
Light-quark and gluon jet discrimination in $$pp$$ p p collisions at $$\sqrt{s}=7\mathrm {\ TeV}$$ s = 7 TeV with the ATLAS detector journal August 2014
Jet trimming journal February 2010
Performance of b -jet identification in the ATLAS experiment journal January 2016
Quark and Gluon Tagging at the LHC journal October 2011
Gaining (mutual) information about quark/gluon discrimination journal November 2014
On the Problem of the Most Efficient Tests of Statistical Hypotheses
  • Neyman, J.; Pearson, E. S.
  • Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 231, Issue 694-706 https://doi.org/10.1098/rsta.1933.0009
journal January 1933
Deep-learning top taggers or the end of QCD text January 2017
Identification of b-quark jets with the CMS experiment text January 2013
The anti-k_t jet clustering algorithm text January 2008
Substructure of high-p_T Jets at the LHC text January 2008
Jet Shapes and Jet Algorithms in SCET text January 2010
Quark and Gluon Tagging at the LHC text January 2011
Classification with Asymmetric Label Noise: Consistency and Maximal Denoising preprint January 2013
Jet-Images: Computer Vision Inspired Techniques for Jet Tagging text January 2014
Factorization for groomed jet substructure beyond the next-to-leading logarithm text January 2016
Deep learning in color: towards automated quark/gluon jet discrimination text January 2016
Deep-learning Top Taggers or The End of QCD? text January 2017
Weakly Supervised Classification in High Energy Physics text January 2017
Systematics of quark/gluon tagging text January 2017

Cited By (29)

A theory of quark vs. gluon discrimination journal October 2019
Quark jet versus gluon jet: fully-connected neural networks with high-level features journal June 2019
Identifying the Relevant Dependencies of the Neural Network Response on Characteristics of the Input Space journal September 2018
Solving differential equations with neural networks: Applications to the calculation of cosmological phase transitions journal July 2019
Deep learning for R -parity violating supersymmetry searches at the LHC journal October 2018
Production of $$\tau \tau jj$$ττjj final states at the LHC and the TauSpinner algorithm: the spin-2 case journal January 2018
Machine learning uncertainties with adversarial neural networks journal January 2019
JUNIPR: a framework for unsupervised machine learning in particle physics journal February 2019
Guiding new physics searches with unsupervised learning journal March 2019
Automating the construction of jet observables with machine learning text January 2019
The Machine Learning landscape of top taggers text January 2019
QCD-Aware Recursive Neural Networks for Jet Physics text January 2017
(Machine) Learning to Do More with Less text January 2017
Production of tau tau jj final states at the LHC and the TauSpinner algorithm: the spin-2 case text January 2017
Energy flow polynomials: A complete linear basis for jet substructure text January 2017
Jet angularity measurements for single inclusive jet production text January 2018
Learning to Classify from Impure Samples with High-Dimensional Data text January 2018
Jet Charge and Machine Learning text January 2018
Identifying the relevant dependencies of the neural network response on characteristics of the input space text January 2018
Infrared Safety of a Neural-Net Top Tagging Algorithm text January 2018
Machine Learning Uncertainties with Adversarial Neural Networks text January 2018
Reweighting a parton shower using a neural network: the final-state case text January 2018
Energy Flow Networks: Deep Sets for Particle Jets text January 2018
Investigating the Topology Dependence of Quark and Gluon Jets text January 2018
Quark-Gluon Tagging: Machine Learning vs Detector text January 2018
Automating the Construction of Jet Observables with Machine Learning text January 2019
Interpretable Deep Learning for Two-Prong Jet Classification with Jet Spectra text January 2019
A Theory of Quark vs. Gluon Discrimination text January 2019
CapsNets Continuing the Convolutional Quest text January 2019

Similar Records

Weakly supervised anomaly detection in the Milky Way
Journal Article · Mon Nov 27 00:00:00 EST 2023 · Monthly Notices of the Royal Astronomical Society · OSTI ID:1421837

A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data
Journal Article · Sat Jan 01 00:00:00 EST 2011 · GeoInformatica: An International Journal on Advances of Computer Science for Geographic Information Systems · OSTI ID:1421837

A Hybrid Classification Scheme for Mining Multisource Geospatial Data
Conference · Mon Jan 01 00:00:00 EST 2007 · OSTI ID:1421837