Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Improve Learning from Crowds via Generative Augmentation

Conference · · Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
 [1];  [2]
  1. University of Virginia, Charlottesville, VA, USA; Computer Science, University of Virginia
  2. University of Virginia, Charlottesville, VA, USA
Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw sparse annotations. We implement two principles of high-quality augmentation using Generative Adversarial Networks: 1) the generated annotations should follow the distribution of authentic ones, which is measured by a discriminator; 2) the generated annotations should have high mutual information with the ground-truth labels, which is measured by an auxiliary network. Extensive experiments and comparisons against an array of state-of-the-art learning from crowds methods on three real-world datasets proved the effectiveness of our data augmentation framework. It shows the potential of our algorithm for low-budget crowdsourcing in general.
Research Organization:
University of Virginia
Sponsoring Organization:
U.S. Department of Energy
DOE Contract Number:
EE0008227
OSTI ID:
1822655
Report Number(s):
DOE-UVA-0008227-5; 1718216,1553568
Conference Information:
Journal Name: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
Country of Publication:
United States
Language:
English

References (13)

GraphGAN: Graph Representation Learning With Generative Adversarial Nets journal April 2018
Fine-Grained Crowdsourcing for Fine-Grained Recognition conference June 2013
Cfgan conference October 2018
Human Uncertainty Makes Classification More Robust conference October 2019
Community-based bayesian aggregation models for crowdsourcing conference January 2014
Who Said What: Modeling Individual Labelers Improves Classification journal April 2018
Enhancing Collaborative Filtering with Generative Augmentation conference July 2019
Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm journal January 1979
LabelMe: A Database and Web-Based Tool for Image Annotation journal October 2007
Learning From Noisy Labels by Regularized Estimation of Annotator Confusion conference June 2019
Deep Learning from Crowds journal April 2018
Irgan conference August 2017
Crowdsourced Data Management: A Survey journal September 2016

Similar Records

Learning from Crowds by Modeling Common Confusions
Conference · Mon Feb 08 23:00:00 EST 2021 · OSTI ID:1822656

Learning from crowds with variational Gaussian processes
Journal Article · Mon Nov 19 19:00:00 EST 2018 · Pattern Recognition · OSTI ID:1488416

CMed: Crowd Analytics for Medical Imaging Data
Journal Article · Tue Nov 19 19:00:00 EST 2019 · IEEE Transactions on Visualization and Computer Graphics · OSTI ID:1677652