Learning from crowds with variational Gaussian processes
Abstract
Solving a supervised learning problem requires to label a training set. This task is traditionally performed by an expert, who provides a label for each sample. The proliferation of social web services (e.g., Amazon Mechanical Turk) has introduced an alternative crowdsourcing approach. Anybody with a computer can register in one of these services and label, either partially or completely, a dataset. The effort of labeling is then shared between a great number of annotators. However, this approach introduces scientifically challenging problems such as combining the unknown expertise of the annotators, handling disagreements on the annotated samples, or detecting the existence of spammer and adversarial annotators. All these problems require probabilistic sound solutions which go beyond the naive use of majority voting plus classical classification methods. In this work we introduce a new crowdsourcing model and inference procedure which trains a Gaussian Process classifier using the noisy labels provided by the annotators. Variational Bayes inference is used to estimate all unknowns. The proposed model can predict the class of new samples and assess the expertise of the involved annotators. Moreover, the Bayesian treatment allows for a solid uncertainty quantification. Since when predicting the class of a new sample we might havemore »
- Authors:
-
- Northwestern University, Evanston, IL (United States)
- University of Granada, Granada (Spain)
- Publication Date:
- Research Org.:
- Northwestern Univ., Evanston, IL (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA); Spanish Ministry of Economy and Competitiveness; University of Granada; La Caixa Banking Foundation
- OSTI Identifier:
- 1488416
- Alternate Identifier(s):
- OSTI ID: 1636753
- Grant/Contract Number:
- NA0002520; DPI2016-77869-C2-2-R
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Pattern Recognition
- Additional Journal Information:
- Journal Volume: 88; Journal Issue: C; Journal ID: ISSN 0031-3203
- Publisher:
- Elsevier
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; crowdsourcing; classification; gaussian processes; Bayesian modeling; variational inference
Citation Formats
Ruiz, Pablo, Morales-Alvarez, Pablo, Molina, Rafael, and Katsaggelos, Aggelos. Learning from crowds with variational Gaussian processes. United States: N. p., 2018.
Web. doi:10.1016/j.patcog.2018.11.021.
Ruiz, Pablo, Morales-Alvarez, Pablo, Molina, Rafael, & Katsaggelos, Aggelos. Learning from crowds with variational Gaussian processes. United States. https://doi.org/10.1016/j.patcog.2018.11.021
Ruiz, Pablo, Morales-Alvarez, Pablo, Molina, Rafael, and Katsaggelos, Aggelos. Tue .
"Learning from crowds with variational Gaussian processes". United States. https://doi.org/10.1016/j.patcog.2018.11.021. https://www.osti.gov/servlets/purl/1488416.
@article{osti_1488416,
title = {Learning from crowds with variational Gaussian processes},
author = {Ruiz, Pablo and Morales-Alvarez, Pablo and Molina, Rafael and Katsaggelos, Aggelos},
abstractNote = {Solving a supervised learning problem requires to label a training set. This task is traditionally performed by an expert, who provides a label for each sample. The proliferation of social web services (e.g., Amazon Mechanical Turk) has introduced an alternative crowdsourcing approach. Anybody with a computer can register in one of these services and label, either partially or completely, a dataset. The effort of labeling is then shared between a great number of annotators. However, this approach introduces scientifically challenging problems such as combining the unknown expertise of the annotators, handling disagreements on the annotated samples, or detecting the existence of spammer and adversarial annotators. All these problems require probabilistic sound solutions which go beyond the naive use of majority voting plus classical classification methods. In this work we introduce a new crowdsourcing model and inference procedure which trains a Gaussian Process classifier using the noisy labels provided by the annotators. Variational Bayes inference is used to estimate all unknowns. The proposed model can predict the class of new samples and assess the expertise of the involved annotators. Moreover, the Bayesian treatment allows for a solid uncertainty quantification. Since when predicting the class of a new sample we might have access to some annotations for it, we also show how our method can naturally incorporate this additional information. Furthermore, a comprehensive experimental section evaluates the proposed method with synthetic and real experiments, showing that it consistently outperforms other state-of-the-art crowdsourcing approaches.},
doi = {10.1016/j.patcog.2018.11.021},
journal = {Pattern Recognition},
number = C,
volume = 88,
place = {United States},
year = {Tue Nov 20 00:00:00 EST 2018},
month = {Tue Nov 20 00:00:00 EST 2018}
}
Web of Science
Works referenced in this record:
Handling data irregularities in classification: Foundations, trends, and future challenges
journal, September 2018
- Das, Swagatam; Datta, Shounak; Chaudhuri, Bidyut B.
- Pattern Recognition, Vol. 81
Active cleaning of label noise
journal, March 2016
- Ekambaram, Rajmadhan; Fefilatyev, Sergiy; Shreve, Matthew
- Pattern Recognition, Vol. 51
Learning from crowdsourced labeled data: a survey
journal, July 2016
- Zhang, Jing; Wu, Xindong; Sheng, Victor S.
- Artificial Intelligence Review, Vol. 46, Issue 4
Learning from multiple annotators with varying expertise
journal, October 2013
- Yan, Yan; Rosales, Rómer; Fung, Glenn
- Machine Learning, Vol. 95, Issue 3
Joint Data Filtering and Labeling Using Gaussian Processes and Alternating Direction Method of Multipliers
journal, July 2016
- Ruiz, Pablo; Molina, Rafael; Katsaggelos, Aggelos K.
- IEEE Transactions on Image Processing, Vol. 25, Issue 7
Remote Sensing Image Classification With Large-Scale Gaussian Processes
journal, February 2018
- Morales-Alvarez, Pablo; Perez-Suay, Adrian; Molina, Rafael
- IEEE Transactions on Geoscience and Remote Sensing, Vol. 56, Issue 2
Learning Supervised Topic Models for Classification and Regression from Crowds
journal, December 2017
- Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete
- IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, Issue 12
Learning with privileged information for multi-Label classification
journal, September 2018
- Wang, Shangfei; Chen, Shiyu; Chen, Tanfang
- Pattern Recognition, Vol. 81
Labelling strategies for hierarchical multi-label classification techniques
journal, August 2016
- Triguero, Isaac; Vens, Celine
- Pattern Recognition, Vol. 56
Sloop: A pattern retrieval engine for individual animal identification
journal, April 2015
- Duyck, James; Finn, Chelsea; Hutcheon, Andy
- Pattern Recognition, Vol. 48, Issue 4
Citizen crowds and experts: observer variability in image-based plant phenotyping
journal, February 2018
- Giuffrida, M. Valerio; Chen, Feng; Scharr, Hanno
- Plant Methods, Vol. 14, Issue 1
A global dataset of crowdsourced land cover and land use reference data
journal, June 2017
- Fritz, Steffen; See, Linda; Perger, Christoph
- Scientific Data, Vol. 4, Issue 1
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images
journal, May 2016
- Albarqouni, Shadi; Baur, Christoph; Achilles, Felix
- IEEE Transactions on Medical Imaging, Vol. 35, Issue 5
Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science
journal, February 2017
- Zevin, M.; Coughlin, S.; Bahaadini, S.
- Classical and Quantum Gravity, Vol. 34, Issue 6
Learning kernel logistic regression in the presence of class label noise
journal, November 2014
- Bootkrajang, Jakramate; Kabán, Ata
- Pattern Recognition, Vol. 47, Issue 11
Learning from multiple annotators: Distinguishing good from random labelers
journal, September 2013
- Rodrigues, Filipe; Pereira, Francisco; Ribeiro, Bernardete
- Pattern Recognition Letters, Vol. 34, Issue 12
Musical genre classification of audio signals
journal, July 2002
- Tzanetakis, G.; Cook, P.
- IEEE Transactions on Speech and Audio Processing, Vol. 10, Issue 5