skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Classification without labels: learning from mixed samples in high energy physics

Abstract

Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.

Authors:
 [1];  [2];  [1]
  1. Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
  2. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
1421837
Grant/Contract Number:  
AC02-05CH11231
Resource Type:
Accepted Manuscript
Journal Name:
Journal of High Energy Physics (Online)
Additional Journal Information:
Journal Name: Journal of High Energy Physics (Online); Journal Volume: 2017; Journal Issue: 10; Journal ID: ISSN 1029-8479
Publisher:
Springer Berlin
Country of Publication:
United States
Language:
English
Subject:
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS; Jets

Citation Formats

Metodiev, Eric M., Nachman, Benjamin, and Thaler, Jesse. Classification without labels: learning from mixed samples in high energy physics. United States: N. p., 2017. Web. doi:10.1007/JHEP10(2017)174.
Metodiev, Eric M., Nachman, Benjamin, & Thaler, Jesse. Classification without labels: learning from mixed samples in high energy physics. United States. doi:10.1007/JHEP10(2017)174.
Metodiev, Eric M., Nachman, Benjamin, and Thaler, Jesse. Wed . "Classification without labels: learning from mixed samples in high energy physics". United States. doi:10.1007/JHEP10(2017)174. https://www.osti.gov/servlets/purl/1421837.
@article{osti_1421837,
title = {Classification without labels: learning from mixed samples in high energy physics},
author = {Metodiev, Eric M. and Nachman, Benjamin and Thaler, Jesse},
abstractNote = {Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.},
doi = {10.1007/JHEP10(2017)174},
journal = {Journal of High Energy Physics (Online)},
number = 10,
volume = 2017,
place = {United States},
year = {2017},
month = {10}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 23 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Weak supervision and other non-standard classification problems: A taxonomy
journal, January 2016


Jet Substructure as a New Higgs-Search Channel at the Large Hadron Collider
journal, June 2008


Jet shapes and jet algorithms in SCET
journal, November 2010

  • Ellis, Stephen D.; Vermilion, Christopher K.; Walsh, Jonathan R.
  • Journal of High Energy Physics, Vol. 2010, Issue 11
  • DOI: 10.1007/JHEP11(2010)101

The anti- k t jet clustering algorithm
journal, April 2008


Soft drop
journal, May 2014

  • Larkoski, Andrew J.; Marzani, Simone; Soyez, Gregory
  • Journal of High Energy Physics, Vol. 2014, Issue 5
  • DOI: 10.1007/JHEP05(2014)146

Substructure of high- p T jets at the LHC
journal, April 2009


A brief introduction to PYTHIA 8.1
journal, June 2008

  • Sjöstrand, Torbjörn; Mrenna, Stephen; Skands, Peter
  • Computer Physics Communications, Vol. 178, Issue 11
  • DOI: 10.1016/j.cpc.2008.01.036

How much information is in a jet?
journal, June 2017


Identification of boosted, hadronically decaying W bosons and comparisons with ATLAS data taken at $$\sqrt{s} = 8$$ s = 8  TeV
journal, March 2016


Jet observables without jet algorithms
journal, April 2014

  • Bertolini, Daniele; Chan, Tucker; Thaler, Jesse
  • Journal of High Energy Physics, Vol. 2014, Issue 4
  • DOI: 10.1007/JHEP04(2014)013

Identification of b-quark jets with the CMS experiment
journal, April 2013


Event shape–energy flow correlations
journal, July 2003


Jet-images — deep learning edition
journal, July 2016

  • de Oliveira, Luke; Kagan, Michael; Mackey, Lester
  • Journal of High Energy Physics, Vol. 2016, Issue 7
  • DOI: 10.1007/JHEP07(2016)069

Deep-learning top taggers or the end of QCD?
journal, May 2017

  • Kasieczka, Gregor; Plehn, Tilman; Russell, Michael
  • Journal of High Energy Physics, Vol. 2017, Issue 5
  • DOI: 10.1007/JHEP05(2017)006

Deep learning in color: towards automated quark/gluon jet discrimination
journal, January 2017

  • Komiske, Patrick T.; Metodiev, Eric M.; Schwartz, Matthew D.
  • Journal of High Energy Physics, Vol. 2017, Issue 1
  • DOI: 10.1007/JHEP01(2017)110

Quark-gluon separation in three-jet events
journal, May 1981


Factorization for groomed jet substructure beyond the next-to-leading logarithm
journal, July 2016

  • Frye, Christopher; Larkoski, Andrew J.; Schwartz, Matthew D.
  • Journal of High Energy Physics, Vol. 2016, Issue 7
  • DOI: 10.1007/JHEP07(2016)064

FastJet user manual: (for version 3.0.2)
journal, March 2012


Weakly supervised classification in high energy physics
journal, May 2017

  • Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco
  • Journal of High Energy Physics, Vol. 2017, Issue 5
  • DOI: 10.1007/JHEP05(2017)145

Jet-images: computer vision inspired techniques for jet tagging
journal, February 2015

  • Cogan, Josh; Kagan, Michael; Strauss, Emanuel
  • Journal of High Energy Physics, Vol. 2015, Issue 2
  • DOI: 10.1007/JHEP02(2015)118

Pure samples of quark and gluon jets at the LHC
journal, October 2011

  • Gallicchio, Jason; Schwartz, Matthew D.
  • Journal of High Energy Physics, Vol. 2011, Issue 10
  • DOI: 10.1007/JHEP10(2011)103

Systematics of quark/gluon tagging
journal, July 2017

  • Gras, Philippe; Höche, Stefan; Kar, Deepak
  • Journal of High Energy Physics, Vol. 2017, Issue 7
  • DOI: 10.1007/JHEP07(2017)091

Jet shapes with the broadening axis
journal, April 2014

  • Larkoski, Andrew J.; Neill, Duff; Thaler, Jesse
  • Journal of High Energy Physics, Vol. 2014, Issue 4
  • DOI: 10.1007/JHEP04(2014)017

Playing tag with ANN: boosted top identification with pattern recognition
journal, July 2015

  • Almeida, Leandro G.; Backović, Mihailo; Cliche, Mathieu
  • Journal of High Energy Physics, Vol. 2015, Issue 7
  • DOI: 10.1007/JHEP07(2015)086

Classification with asymmetric label noise: Consistency and maximal denoising
journal, January 2016

  • Blanchard, Gilles; Flaska, Marek; Handy, Gregory
  • Electronic Journal of Statistics, Vol. 10, Issue 2
  • DOI: 10.1214/16-EJS1193

Quark and gluon jet substructure
journal, April 2013

  • Gallicchio, Jason; Schwartz, Matthew D.
  • Journal of High Energy Physics, Vol. 2013, Issue 4
  • DOI: 10.1007/JHEP04(2013)090

Towards an understanding of jet substructure
journal, September 2013

  • Dasgupta, Mrinal; Fregoso, Alessandro; Marzani, Simone
  • Journal of High Energy Physics, Vol. 2013, Issue 9
  • DOI: 10.1007/JHEP09(2013)029

Using neural networks to identify jets
journal, February 1991


Light-quark and gluon jet discrimination in $$pp$$ p p collisions at $$\sqrt{s}=7\mathrm {\ TeV}$$ s = 7 TeV with the ATLAS detector
journal, August 2014


Jet trimming
journal, February 2010

  • Krohn, David; Thaler, Jesse; Wang, Lian-Tao
  • Journal of High Energy Physics, Vol. 2010, Issue 2
  • DOI: 10.1007/JHEP02(2010)084

Quark and Gluon Tagging at the LHC
journal, October 2011


Gaining (mutual) information about quark/gluon discrimination
journal, November 2014

  • Larkoski, Andrew J.; Thaler, Jesse; Waalewijn, Wouter J.
  • Journal of High Energy Physics, Vol. 2014, Issue 11
  • DOI: 10.1007/JHEP11(2014)129

    Works referencing / citing this record:

    The Machine Learning landscape of top taggers
    journal, January 2019


    A theory of quark vs. gluon discrimination
    journal, October 2019

    • Larkoski, Andrew J.; Metodiev, Eric M.
    • Journal of High Energy Physics, Vol. 2019, Issue 10
    • DOI: 10.1007/jhep10(2019)014

    An operational definition of quark and gluon jets
    journal, November 2018

    • Komiske, Patrick T.; Metodiev, Eric M.; Thaler, Jesse
    • Journal of High Energy Physics, Vol. 2018, Issue 11
    • DOI: 10.1007/jhep11(2018)059

    Quark jet versus gluon jet: fully-connected neural networks with high-level features
    journal, June 2019

    • Luo, Hui; Luo, Ming-Xing; Wang, Kai
    • Science China Physics, Mechanics & Astronomy, Vol. 62, Issue 9
    • DOI: 10.1007/s11433-019-9390-8

    A theory of quark vs. gluon discrimination
    journal, October 2019

    • Larkoski, Andrew J.; Metodiev, Eric M.
    • Journal of High Energy Physics, Vol. 2019, Issue 10
    • DOI: 10.1007/jhep10(2019)014

    An operational definition of quark and gluon jets
    journal, November 2018

    • Komiske, Patrick T.; Metodiev, Eric M.; Thaler, Jesse
    • Journal of High Energy Physics, Vol. 2018, Issue 11
    • DOI: 10.1007/jhep11(2018)059

    Quark jet versus gluon jet: fully-connected neural networks with high-level features
    journal, June 2019

    • Luo, Hui; Luo, Ming-Xing; Wang, Kai
    • Science China Physics, Mechanics & Astronomy, Vol. 62, Issue 9
    • DOI: 10.1007/s11433-019-9390-8

    Deep learning for R -parity violating supersymmetry searches at the LHC
    journal, October 2018


    Jet Topics: Disentangling Quarks and Gluons at Colliders
    journal, June 2018


    The Machine Learning landscape of top taggers
    journal, January 2019


    Reweighting a parton shower using a neural network: the final-state case
    journal, January 2019

    • Bothmann, Enrico; Del Debbio, Luigi
    • Journal of High Energy Physics, Vol. 2019, Issue 1
    • DOI: 10.1007/jhep01(2019)033

    QCD-aware recursive neural networks for jet physics
    journal, January 2019

    • Louppe, Gilles; Cho, Kyunghyun; Becot, Cyril
    • Journal of High Energy Physics, Vol. 2019, Issue 1
    • DOI: 10.1007/jhep01(2019)057

    Energy flow networks: deep sets for particle jets
    journal, January 2019

    • Komiske, Patrick T.; Metodiev, Eric M.; Thaler, Jesse
    • Journal of High Energy Physics, Vol. 2019, Issue 1
    • DOI: 10.1007/jhep01(2019)121

    (Machine) learning to do more with less
    journal, February 2018

    • Cohen, Timothy; Freytsis, Marat; Ostdiek, Bryan
    • Journal of High Energy Physics, Vol. 2018, Issue 2
    • DOI: 10.1007/jhep02(2018)034

    Infrared safety of a neural-net top tagging algorithm
    journal, February 2019

    • Choi, Suyong; Lee, Seung J.; Perelstein, Maxim
    • Journal of High Energy Physics, Vol. 2019, Issue 2
    • DOI: 10.1007/jhep02(2019)132

    Novel jet observables from machine learning
    journal, March 2018

    • Datta, Kaustuv; Larkoski, Andrew J.
    • Journal of High Energy Physics, Vol. 2018, Issue 3
    • DOI: 10.1007/jhep03(2018)086

    Investigating the topology dependence of quark and gluon jets
    journal, March 2019

    • Bright-Thonney, Samuel; Nachman, Benjamin
    • Journal of High Energy Physics, Vol. 2019, Issue 3
    • DOI: 10.1007/jhep03(2019)098

    Energy flow polynomials: a complete linear basis for jet substructure
    journal, April 2018

    • Komiske, Patrick T.; Metodiev, Eric M.; Thaler, Jesse
    • Journal of High Energy Physics, Vol. 2018, Issue 4
    • DOI: 10.1007/jhep04(2018)013

    Jet angularity measurements for single inclusive jet production
    journal, April 2018

    • Kang, Zhong-Bo; Lee, Kyle; Ringer, Felix
    • Journal of High Energy Physics, Vol. 2018, Issue 4
    • DOI: 10.1007/jhep04(2018)110

    Interpretable deep learning for two-prong jet classification with jet spectra
    journal, July 2019

    • Chakraborty, Amit; Lim, Sung Hak; Nojiri, Mihoko M.
    • Journal of High Energy Physics, Vol. 2019, Issue 7
    • DOI: 10.1007/jhep07(2019)135

    Jet charge and machine learning
    journal, October 2018

    • Fraser, Katherine; Schwartz, Matthew D.
    • Journal of High Energy Physics, Vol. 2018, Issue 10
    • DOI: 10.1007/jhep10(2018)093

    Boosting H → b b ¯ $$ H\to b\overline{b} $$ with machine learning
    journal, October 2018

    • Lin, Joshua; Freytsis, Marat; Moult, Ian
    • Journal of High Energy Physics, Vol. 2018, Issue 10
    • DOI: 10.1007/jhep10(2018)101

    Pulling out all the tops with computer vision and deep learning
    journal, October 2018

    • Macaluso, Sebastian; Shih, David
    • Journal of High Energy Physics, Vol. 2018, Issue 10
    • DOI: 10.1007/jhep10(2018)121

    Adversarially-trained autoencoders for robust unsupervised new physics searches
    journal, October 2019

    • Blance, Andrew; Spannowsky, Michael; Waite, Philip
    • Journal of High Energy Physics, Vol. 2019, Issue 10
    • DOI: 10.1007/jhep10(2019)047

    The Lund jet plane
    journal, December 2018

    • Dreyer, Frédéric A.; Salam, Gavin P.; Soyez, Grégory
    • Journal of High Energy Physics, Vol. 2018, Issue 12
    • DOI: 10.1007/jhep12(2018)064

    Identifying the Relevant Dependencies of the Neural Network Response on Characteristics of the Input Space
    journal, September 2018

    • Wunsch, Stefan; Friese, Raphael; Wolf, Roger
    • Computing and Software for Big Science, Vol. 2, Issue 1
    • DOI: 10.1007/s41781-018-0012-1

    Solving differential equations with neural networks: Applications to the calculation of cosmological phase transitions
    journal, July 2019


    Uncovering latent jet substructure
    journal, September 2019


    Automating the construction of jet observables with machine learning
    journal, November 2019


    Learning to classify from impure samples with high-dimensional data
    journal, July 2018


    Extending the search for new resonances with machine learning
    journal, January 2019


    Anomaly Detection for Resonant New Physics with Machine Learning
    journal, December 2018


    binary junipr: An Interpretable Probabilistic Model for Discrimination
    journal, October 2019


    Machine learning and the physical sciences
    journal, December 2019


    Jet substructure at the Large Hadron Collider
    journal, December 2019


    Production of $$\tau \tau jj$$ττjj final states at the LHC and the TauSpinner algorithm: the spin-2 case
    journal, January 2018


    Machine learning uncertainties with adversarial neural networks
    journal, January 2019


    JUNIPR: a framework for unsupervised machine learning in particle physics
    journal, February 2019


    Guiding new physics searches with unsupervised learning
    journal, March 2019


    QCD or what?
    journal, January 2019


    Quark-gluon tagging: Machine learning vs detector
    journal, January 2019


    Deep-learning jets with uncertainties and more
    journal, January 2020


    CapsNets continuing the convolutional quest
    journal, January 2020