Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Scalable and efficient learning from crowds with Gaussian processes

Journal Article · · Information Fusion
Over the last few years, multiply-annotated data has become a very popular source of information. Online platforms such as Amazon Mechanical Turk have revolutionized the labelling process needed for any classification task, sharing the effort between a number of annotators (instead of the classical single expert). This crowdsourcing approach has introduced new challenging problems, such as handling disagreements on the annotated samples or combining the unknown expertise of the annotators. Probabilistic methods, such as Gaussian Processes (GP), have proven successful to model this new crowdsourcing scenario. However, GPs do not scale up well with the training set size, which makes them prohibitive for medium-to-large datasets (beyond 10K training instances). This constitutes a serious limitation for current real-world applications. In this work, we introduce two scalable and efficient GP-based crowdsourcing methods that allow for processing previously-prohibitive datasets. The first one is an efficient and fast approximation to GP with squared exponential (SE) kernel. The second allows for learning a more flexible kernel at the expense of a heavier training (but still scalable to large datasets). Since the latter is not a GP-SE approximation, it can be also considered as a whole new scalable and efficient crowdsourcing method, useful for any dataset size. Both methods use Fourier features and variational inference, can predict the class of new samples, and estimate the expertise of the involved annotators. A complete experimentation compares them with state-of-the-art probabilistic approaches in synthetic and real crowdsourcing datasets of different sizes. Finally, they stand out as the best performing approach for large scale problems. Moreover, the second method is competitive with the current state-of-the-art for small datasets.
Research Organization:
Northwestern Univ., Evanston, IL (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
NA0002520
OSTI ID:
1801110
Alternate ID(s):
OSTI ID: 1547904
Journal Information:
Information Fusion, Journal Name: Information Fusion Journal Issue: C Vol. 52; ISSN 1566-2535
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English

References (13)

The SPHERE Challenge: Activity Recognition with Multimodal Sensor Data dataset January 2016
Learning from crowdsourced labeled data: a survey journal July 2016
Learning from multiple annotators with varying expertise journal October 2013
Learning from crowds with variational Gaussian processes journal April 2019
Learning from multiple annotators: Distinguishing good from random labelers journal September 2013
Variational Inference: A Review for Statisticians journal July 2016
Gravity Spy: integrating advanced LIGO detector characterization, machine learning, and citizen science journal February 2017
Bayesian Active Remote Sensing Image Classification journal April 2014
Remote Sensing Image Classification With Large-Scale Gaussian Processes journal February 2018
Joint Data Filtering and Labeling Using Gaussian Processes and Alternating Direction Method of Multipliers journal July 2016
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images journal May 2016
Learning Supervised Topic Models for Classification and Regression from Crowds journal December 2017
Musical genre classification of audio signals journal July 2002

Similar Records

Learning from crowds with variational Gaussian processes
Journal Article · Mon Nov 19 23:00:00 EST 2018 · Pattern Recognition · OSTI ID:1488416

Improve Learning from Crowds via Generative Augmentation
Conference · Sat Aug 14 00:00:00 EDT 2021 · Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining · OSTI ID:1822655

Learning from Crowds by Modeling Common Confusions
Conference · Mon Feb 08 23:00:00 EST 2021 · OSTI ID:1822656