skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Learning to classify from impure samples with high-dimensional data

Abstract

A persistent challenge in practical classification tasks is that labeled training sets are not always available. In particle physics, this challenge is surmounted by the use of simulations. These simulations accurately reproduce most features of data, but cannot be trusted to capture all of the complex correlations exploitable by modern machine learning methods. Recent work in weakly supervised learning has shown that simple, low-dimensional classifiers can be trained using only the impure mixtures present in data. Here, we demonstrate that complex, high-dimensional classifiers can also be trained on impure mixtures using weak supervision techniques, with performance comparable to what could be achieved with pure samples. Using weak supervision will therefore allow us to avoid relying exclusively on simulations for high-dimensional classification. Lastly, this work opens the door to a new regime whereby complex models are trained directly on data, providing direct access to probe the underlying physics.

Authors:
; ; ;
Publication Date:
Research Org.:
Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Nuclear Physics (NP) (SC-26); USDOE Office of Science (SC), High Energy Physics (HEP) (SC-25)
OSTI Identifier:
1460486
Alternate Identifier(s):
OSTI ID: 1482535
Grant/Contract Number:  
AC02-05CH11231; SC0013607; SC0011090; SC0012567
Resource Type:
Published Article
Journal Name:
Physical Review D
Additional Journal Information:
Journal Name: Physical Review D Journal Volume: 98 Journal Issue: 1; Journal ID: ISSN 2470-0010
Publisher:
American Physical Society
Country of Publication:
United States
Language:
English
Subject:
72 PHYSICS OF ELEMENTARY PARTICLES AND FIELDS

Citation Formats

Komiske, Patrick T., Metodiev, Eric M., Nachman, Benjamin, and Schwartz, Matthew D. Learning to classify from impure samples with high-dimensional data. United States: N. p., 2018. Web. doi:10.1103/PhysRevD.98.011502.
Komiske, Patrick T., Metodiev, Eric M., Nachman, Benjamin, & Schwartz, Matthew D. Learning to classify from impure samples with high-dimensional data. United States. doi:10.1103/PhysRevD.98.011502.
Komiske, Patrick T., Metodiev, Eric M., Nachman, Benjamin, and Schwartz, Matthew D. Mon . "Learning to classify from impure samples with high-dimensional data". United States. doi:10.1103/PhysRevD.98.011502.
@article{osti_1460486,
title = {Learning to classify from impure samples with high-dimensional data},
author = {Komiske, Patrick T. and Metodiev, Eric M. and Nachman, Benjamin and Schwartz, Matthew D.},
abstractNote = {A persistent challenge in practical classification tasks is that labeled training sets are not always available. In particle physics, this challenge is surmounted by the use of simulations. These simulations accurately reproduce most features of data, but cannot be trusted to capture all of the complex correlations exploitable by modern machine learning methods. Recent work in weakly supervised learning has shown that simple, low-dimensional classifiers can be trained using only the impure mixtures present in data. Here, we demonstrate that complex, high-dimensional classifiers can also be trained on impure mixtures using weak supervision techniques, with performance comparable to what could be achieved with pure samples. Using weak supervision will therefore allow us to avoid relying exclusively on simulations for high-dimensional classification. Lastly, this work opens the door to a new regime whereby complex models are trained directly on data, providing direct access to probe the underlying physics.},
doi = {10.1103/PhysRevD.98.011502},
journal = {Physical Review D},
number = 1,
volume = 98,
place = {United States},
year = {2018},
month = {7}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record
DOI: 10.1103/PhysRevD.98.011502

Citation Metrics:
Cited by: 11 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: The AUC and training time of CWoLa (solid) and LLP (dashed) as the batch size is varied. Training times are measured on an NVIDIA Tesla K80 GPU using CUDA 8.0, TensorFlow 1.4.1, and Keras 2.1.2. AUC is a measure of classifier performance and is 1 for a perfectmore » classifier and 0.5 for a completely random one.« less

Save / Share:

Works referenced in this record:

Weak supervision and other non-standard classification problems: A taxonomy
journal, January 2016


The anti- k t jet clustering algorithm
journal, April 2008


A brief introduction to PYTHIA 8.1
journal, June 2008

  • Sjöstrand, Torbjörn; Mrenna, Stephen; Skands, Peter
  • Computer Physics Communications, Vol. 178, Issue 11
  • DOI: 10.1016/j.cpc.2008.01.036

Jet substructure classification in high-energy physics with deep neural networks
journal, May 2016


How much information is in a jet?
journal, June 2017


CaloGAN: Simulating 3D high energy particle showers in multilayer electromagnetic calorimeters with generative adversarial networks
journal, January 2018


Identification of boosted, hadronically decaying W bosons and comparisons with ATLAS data taken at $$\sqrt{s} = 8$$ s = 8  TeV
journal, March 2016


(Machine) learning to do more with less
journal, February 2018

  • Cohen, Timothy; Freytsis, Marat; Ostdiek, Bryan
  • Journal of High Energy Physics, Vol. 2018, Issue 2
  • DOI: 10.1007/JHEP02(2018)034

Identification of b-quark jets with the CMS experiment
journal, April 2013


Classification without labels: learning from mixed samples in high energy physics
journal, October 2017

  • Metodiev, Eric M.; Nachman, Benjamin; Thaler, Jesse
  • Journal of High Energy Physics, Vol. 2017, Issue 10
  • DOI: 10.1007/JHEP10(2017)174

Pileup Mitigation with Machine Learning (PUMML)
journal, December 2017

  • Komiske, Patrick T.; Metodiev, Eric M.; Nachman, Benjamin
  • Journal of High Energy Physics, Vol. 2017, Issue 12
  • DOI: 10.1007/JHEP12(2017)051

Jet-images — deep learning edition
journal, July 2016

  • de Oliveira, Luke; Kagan, Michael; Mackey, Lester
  • Journal of High Energy Physics, Vol. 2016, Issue 7
  • DOI: 10.1007/JHEP07(2016)069

Deep-learning top taggers or the end of QCD?
journal, May 2017

  • Kasieczka, Gregor; Plehn, Tilman; Russell, Michael
  • Journal of High Energy Physics, Vol. 2017, Issue 5
  • DOI: 10.1007/JHEP05(2017)006

Deep learning in color: towards automated quark/gluon jet discrimination
journal, January 2017

  • Komiske, Patrick T.; Metodiev, Eric M.; Schwartz, Matthew D.
  • Journal of High Energy Physics, Vol. 2017, Issue 1
  • DOI: 10.1007/JHEP01(2017)110

FastJet user manual: (for version 3.0.2)
journal, March 2012


Weakly supervised classification in high energy physics
journal, May 2017

  • Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco
  • Journal of High Energy Physics, Vol. 2017, Issue 5
  • DOI: 10.1007/JHEP05(2017)145

Jet-images: computer vision inspired techniques for jet tagging
journal, February 2015

  • Cogan, Josh; Kagan, Michael; Strauss, Emanuel
  • Journal of High Energy Physics, Vol. 2015, Issue 2
  • DOI: 10.1007/JHEP02(2015)118

Energy flow polynomials: a complete linear basis for jet substructure
journal, April 2018

  • Komiske, Patrick T.; Metodiev, Eric M.; Thaler, Jesse
  • Journal of High Energy Physics, Vol. 2018, Issue 4
  • DOI: 10.1007/JHEP04(2018)013

Pure samples of quark and gluon jets at the LHC
journal, October 2011

  • Gallicchio, Jason; Schwartz, Matthew D.
  • Journal of High Energy Physics, Vol. 2011, Issue 10
  • DOI: 10.1007/JHEP10(2011)103

Identification of heavy-flavour jets with the CMS detector in pp collisions at 13 TeV
journal, May 2018


Systematics of quark/gluon tagging
journal, July 2017

  • Gras, Philippe; Höche, Stefan; Kar, Deepak
  • Journal of High Energy Physics, Vol. 2017, Issue 7
  • DOI: 10.1007/JHEP07(2017)091

Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis
journal, September 2017

  • de Oliveira, Luke; Paganini, Michela; Nachman, Benjamin
  • Computing and Software for Big Science, Vol. 1, Issue 1
  • DOI: 10.1007/s41781-017-0004-6

Identification of high transverse momentum top quarks in pp collisions at s = 8 $$ \sqrt{s}=8 $$ TeV with the ATLAS detector
journal, June 2016

  • Aad, G.; Abbott, B.; Abdallah, J.
  • Journal of High Energy Physics, Vol. 2016, Issue 6
  • DOI: 10.1007/JHEP06(2016)093

Playing tag with ANN: boosted top identification with pattern recognition
journal, July 2015

  • Almeida, Leandro G.; Backović, Mihailo; Cliche, Mathieu
  • Journal of High Energy Physics, Vol. 2015, Issue 7
  • DOI: 10.1007/JHEP07(2015)086

Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multilayer Calorimeters
journal, January 2018


Classification with asymmetric label noise: Consistency and maximal denoising
journal, January 2016

  • Blanchard, Gilles; Flaska, Marek; Handy, Gregory
  • Electronic Journal of Statistics, Vol. 10, Issue 2
  • DOI: 10.1214/16-EJS1193

Quark and gluon jet substructure
journal, April 2013

  • Gallicchio, Jason; Schwartz, Matthew D.
  • Journal of High Energy Physics, Vol. 2013, Issue 4
  • DOI: 10.1007/JHEP04(2013)090

Light-quark and gluon jet discrimination in $$pp$$ p p collisions at $$\sqrt{s}=7\mathrm {\ TeV}$$ s = 7 TeV with the ATLAS detector
journal, August 2014


Identification techniques for highly boosted W bosons that decay into hadrons
journal, December 2014

  • Khachatryan, V.; Sirunyan, A. M.; Tumasyan, A.
  • Journal of High Energy Physics, Vol. 2014, Issue 12
  • DOI: 10.1007/JHEP12(2014)017

Parton shower uncertainties in jet substructure analyses with deep neural networks
journal, January 2017


    Works referencing / citing this record:

    Effective Diagnosis of Alzheimer’s Disease via Multimodal Fusion Analysis Framework
    journal, October 2019


    Effective Diagnosis of Alzheimer’s Disease via Multimodal Fusion Analysis Framework
    journal, October 2019


      Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.