DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Learning from crowds with variational Gaussian processes

Abstract

Solving a supervised learning problem requires to label a training set. This task is traditionally performed by an expert, who provides a label for each sample. The proliferation of social web services (e.g., Amazon Mechanical Turk) has introduced an alternative crowdsourcing approach. Anybody with a computer can register in one of these services and label, either partially or completely, a dataset. The effort of labeling is then shared between a great number of annotators. However, this approach introduces scientifically challenging problems such as combining the unknown expertise of the annotators, handling disagreements on the annotated samples, or detecting the existence of spammer and adversarial annotators. All these problems require probabilistic sound solutions which go beyond the naive use of majority voting plus classical classification methods. In this work we introduce a new crowdsourcing model and inference procedure which trains a Gaussian Process classifier using the noisy labels provided by the annotators. Variational Bayes inference is used to estimate all unknowns. The proposed model can predict the class of new samples and assess the expertise of the involved annotators. Moreover, the Bayesian treatment allows for a solid uncertainty quantification. Since when predicting the class of a new sample we might havemore » access to some annotations for it, we also show how our method can naturally incorporate this additional information. Furthermore, a comprehensive experimental section evaluates the proposed method with synthetic and real experiments, showing that it consistently outperforms other state-of-the-art crowdsourcing approaches.« less

Authors:
 [1];  [2];  [2]; ORCiD logo [1]
  1. Northwestern University, Evanston, IL (United States)
  2. University of Granada, Granada (Spain)
Publication Date:
Research Org.:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Org.:
USDOE National Nuclear Security Administration (NNSA); Spanish Ministry of Economy and Competitiveness; University of Granada; La Caixa Banking Foundation
OSTI Identifier:
1488416
Alternate Identifier(s):
OSTI ID: 1636753
Grant/Contract Number:  
NA0002520; DPI2016-77869-C2-2-R
Resource Type:
Accepted Manuscript
Journal Name:
Pattern Recognition
Additional Journal Information:
Journal Volume: 88; Journal Issue: C; Journal ID: ISSN 0031-3203
Publisher:
Elsevier
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; crowdsourcing; classification; gaussian processes; Bayesian modeling; variational inference

Citation Formats

Ruiz, Pablo, Morales-Alvarez, Pablo, Molina, Rafael, and Katsaggelos, Aggelos. Learning from crowds with variational Gaussian processes. United States: N. p., 2018. Web. doi:10.1016/j.patcog.2018.11.021.
Ruiz, Pablo, Morales-Alvarez, Pablo, Molina, Rafael, & Katsaggelos, Aggelos. Learning from crowds with variational Gaussian processes. United States. https://doi.org/10.1016/j.patcog.2018.11.021
Ruiz, Pablo, Morales-Alvarez, Pablo, Molina, Rafael, and Katsaggelos, Aggelos. Tue . "Learning from crowds with variational Gaussian processes". United States. https://doi.org/10.1016/j.patcog.2018.11.021. https://www.osti.gov/servlets/purl/1488416.
@article{osti_1488416,
title = {Learning from crowds with variational Gaussian processes},
author = {Ruiz, Pablo and Morales-Alvarez, Pablo and Molina, Rafael and Katsaggelos, Aggelos},
abstractNote = {Solving a supervised learning problem requires to label a training set. This task is traditionally performed by an expert, who provides a label for each sample. The proliferation of social web services (e.g., Amazon Mechanical Turk) has introduced an alternative crowdsourcing approach. Anybody with a computer can register in one of these services and label, either partially or completely, a dataset. The effort of labeling is then shared between a great number of annotators. However, this approach introduces scientifically challenging problems such as combining the unknown expertise of the annotators, handling disagreements on the annotated samples, or detecting the existence of spammer and adversarial annotators. All these problems require probabilistic sound solutions which go beyond the naive use of majority voting plus classical classification methods. In this work we introduce a new crowdsourcing model and inference procedure which trains a Gaussian Process classifier using the noisy labels provided by the annotators. Variational Bayes inference is used to estimate all unknowns. The proposed model can predict the class of new samples and assess the expertise of the involved annotators. Moreover, the Bayesian treatment allows for a solid uncertainty quantification. Since when predicting the class of a new sample we might have access to some annotations for it, we also show how our method can naturally incorporate this additional information. Furthermore, a comprehensive experimental section evaluates the proposed method with synthetic and real experiments, showing that it consistently outperforms other state-of-the-art crowdsourcing approaches.},
doi = {10.1016/j.patcog.2018.11.021},
journal = {Pattern Recognition},
number = C,
volume = 88,
place = {United States},
year = {Tue Nov 20 00:00:00 EST 2018},
month = {Tue Nov 20 00:00:00 EST 2018}
}

Journal Article:

Citation Metrics:
Cited by: 21 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Handling data irregularities in classification: Foundations, trends, and future challenges
journal, September 2018


Active cleaning of label noise
journal, March 2016


Learning from crowdsourced labeled data: a survey
journal, July 2016


Learning from multiple annotators with varying expertise
journal, October 2013


Joint Data Filtering and Labeling Using Gaussian Processes and Alternating Direction Method of Multipliers
journal, July 2016

  • Ruiz, Pablo; Molina, Rafael; Katsaggelos, Aggelos K.
  • IEEE Transactions on Image Processing, Vol. 25, Issue 7
  • DOI: 10.1109/TIP.2016.2558472

Remote Sensing Image Classification With Large-Scale Gaussian Processes
journal, February 2018

  • Morales-Alvarez, Pablo; Perez-Suay, Adrian; Molina, Rafael
  • IEEE Transactions on Geoscience and Remote Sensing, Vol. 56, Issue 2
  • DOI: 10.1109/TGRS.2017.2758922

Learning Supervised Topic Models for Classification and Regression from Crowds
journal, December 2017

  • Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete
  • IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, Issue 12
  • DOI: 10.1109/TPAMI.2017.2648786

Learning with privileged information for multi-Label classification
journal, September 2018


Labelling strategies for hierarchical multi-label classification techniques
journal, August 2016


Sloop: A pattern retrieval engine for individual animal identification
journal, April 2015


Citizen crowds and experts: observer variability in image-based plant phenotyping
journal, February 2018


A global dataset of crowdsourced land cover and land use reference data
journal, June 2017

  • Fritz, Steffen; See, Linda; Perger, Christoph
  • Scientific Data, Vol. 4, Issue 1
  • DOI: 10.1038/sdata.2017.75

AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images
journal, May 2016

  • Albarqouni, Shadi; Baur, Christoph; Achilles, Felix
  • IEEE Transactions on Medical Imaging, Vol. 35, Issue 5
  • DOI: 10.1109/TMI.2016.2528120

Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science
journal, February 2017


Learning kernel logistic regression in the presence of class label noise
journal, November 2014


Learning from multiple annotators: Distinguishing good from random labelers
journal, September 2013

  • Rodrigues, Filipe; Pereira, Francisco; Ribeiro, Bernardete
  • Pattern Recognition Letters, Vol. 34, Issue 12
  • DOI: 10.1016/j.patrec.2013.05.012

Musical genre classification of audio signals
journal, July 2002

  • Tzanetakis, G.; Cook, P.
  • IEEE Transactions on Speech and Audio Processing, Vol. 10, Issue 5
  • DOI: 10.1109/TSA.2002.800560