Improve Learning from Crowds via Generative Augmentation
- University of Virginia, Charlottesville, VA, USA
Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw sparse annotations. We implement two principles of high-quality augmentation using Generative Adversarial Networks: 1) the generated annotations should follow the distribution of authentic ones, which is measured by a discriminator; 2) the generated annotations should have high mutual information with the ground-truth labels, which is measured by an auxiliary network. Extensive experiments and comparisons against an array of state-of-the-art learning from crowds methods on three real-world datasets proved the effectiveness of our data augmentation framework. It shows the potential of our algorithm for low-budget crowdsourcing in general.
- Research Organization:
- University of Virginia
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- EE0008227
- OSTI ID:
- 1822655
- Report Number(s):
- DOE-UVA-0008227-5; 1718216,1553568
- Journal Information:
- Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Conference: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021
- Country of Publication:
- United States
- Language:
- English
Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm
|
journal | January 1979 |
LabelMe: A Database and Web-Based Tool for Image Annotation
|
journal | October 2007 |
Similar Records
Learning from crowds with variational Gaussian processes
SeismoGen: Seismic Waveform Synthesis Using GAN With Application to Seismic Data Augmentation