skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Learning from crowds with variational Gaussian processes

Journal Article · · Pattern Recognition

Solving a supervised learning problem requires to label a training set. This task is traditionally performed by an expert, who provides a label for each sample. The proliferation of social web services (e.g., Amazon Mechanical Turk) has introduced an alternative crowdsourcing approach. Anybody with a computer can register in one of these services and label, either partially or completely, a dataset. The effort of labeling is then shared between a great number of annotators. However, this approach introduces scientifically challenging problems such as combining the unknown expertise of the annotators, handling disagreements on the annotated samples, or detecting the existence of spammer and adversarial annotators. All these problems require probabilistic sound solutions which go beyond the naive use of majority voting plus classical classification methods. In this work we introduce a new crowdsourcing model and inference procedure which trains a Gaussian Process classifier using the noisy labels provided by the annotators. Variational Bayes inference is used to estimate all unknowns. The proposed model can predict the class of new samples and assess the expertise of the involved annotators. Moreover, the Bayesian treatment allows for a solid uncertainty quantification. Since when predicting the class of a new sample we might have access to some annotations for it, we also show how our method can naturally incorporate this additional information. Furthermore, a comprehensive experimental section evaluates the proposed method with synthetic and real experiments, showing that it consistently outperforms other state-of-the-art crowdsourcing approaches.

Research Organization:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA); Spanish Ministry of Economy and Competitiveness; University of Granada; La Caixa Banking Foundation
Grant/Contract Number:
NA0002520; DPI2016-77869-C2-2-R
OSTI ID:
1488416
Alternate ID(s):
OSTI ID: 1636753
Journal Information:
Pattern Recognition, Vol. 88, Issue C; ISSN 0031-3203
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 21 works
Citation information provided by
Web of Science

References (17)

Handling data irregularities in classification: Foundations, trends, and future challenges journal September 2018
Active cleaning of label noise journal March 2016
Learning from crowdsourced labeled data: a survey journal July 2016
Learning from multiple annotators with varying expertise journal October 2013
Joint Data Filtering and Labeling Using Gaussian Processes and Alternating Direction Method of Multipliers journal July 2016
Remote Sensing Image Classification With Large-Scale Gaussian Processes journal February 2018
Learning Supervised Topic Models for Classification and Regression from Crowds journal December 2017
Learning with privileged information for multi-Label classification journal September 2018
Labelling strategies for hierarchical multi-label classification techniques journal August 2016
Sloop: A pattern retrieval engine for individual animal identification journal April 2015
Citizen crowds and experts: observer variability in image-based plant phenotyping journal February 2018
A global dataset of crowdsourced land cover and land use reference data journal June 2017
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images journal May 2016
Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science journal February 2017
Learning kernel logistic regression in the presence of class label noise journal November 2014
Learning from multiple annotators: Distinguishing good from random labelers journal September 2013
Musical genre classification of audio signals journal July 2002

Similar Records

Scalable and efficient learning from crowds with Gaussian processes
Journal Article · Wed Jan 02 00:00:00 EST 2019 · Information Fusion · OSTI ID:1488416

Learning from Crowds by Modeling Common Confusions
Conference · Tue Feb 09 00:00:00 EST 2021 · OSTI ID:1488416

Improve Learning from Crowds via Generative Augmentation
Conference · Sat Aug 14 00:00:00 EDT 2021 · Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining · OSTI ID:1488416