skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Improve Learning from Crowds via Generative Augmentation

Conference · · Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
 [1];  [1]
  1. University of Virginia, Charlottesville, VA, USA

Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw sparse annotations. We implement two principles of high-quality augmentation using Generative Adversarial Networks: 1) the generated annotations should follow the distribution of authentic ones, which is measured by a discriminator; 2) the generated annotations should have high mutual information with the ground-truth labels, which is measured by an auxiliary network. Extensive experiments and comparisons against an array of state-of-the-art learning from crowds methods on three real-world datasets proved the effectiveness of our data augmentation framework. It shows the potential of our algorithm for low-budget crowdsourcing in general.

Research Organization:
University of Virginia
Sponsoring Organization:
USDOE
DOE Contract Number:
EE0008227
OSTI ID:
1822655
Report Number(s):
DOE-UVA-0008227-5; 1718216,1553568
Journal Information:
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Conference: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021
Country of Publication:
United States
Language:
English

References (2)

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm journal January 1979
LabelMe: A Database and Web-Based Tool for Image Annotation journal October 2007

Similar Records

Learning from Crowds by Modeling Common Confusions
Conference · Tue Feb 09 00:00:00 EST 2021 · OSTI ID:1822655

Learning from crowds with variational Gaussian processes
Journal Article · Tue Nov 20 00:00:00 EST 2018 · Pattern Recognition · OSTI ID:1822655

SeismoGen: Seismic Waveform Synthesis Using GAN With Application to Seismic Data Augmentation
Journal Article · Fri Apr 16 00:00:00 EDT 2021 · Journal of Geophysical Research. Solid Earth · OSTI ID:1822655