Improve Learning from Crowds via Generative Augmentation

Chu, Zhendong; Wang, Hongning

doi:10.1145/3447548.3467409

Title: Improve Learning from Crowds via Generative Augmentation

Conference · Sat Aug 14 00:00:00 EDT 2021 · Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

DOI:https://doi.org/10.1145/3447548.3467409· OSTI ID:1822655

Chu, Zhendong ^[1]; Wang, Hongning ^[1]

University of Virginia, Charlottesville, VA, USA

Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw sparse annotations. We implement two principles of high-quality augmentation using Generative Adversarial Networks: 1) the generated annotations should follow the distribution of authentic ones, which is measured by a discriminator; 2) the generated annotations should have high mutual information with the ground-truth labels, which is measured by an auxiliary network. Extensive experiments and comparisons against an array of state-of-the-art learning from crowds methods on three real-world datasets proved the effectiveness of our data augmentation framework. It shows the potential of our algorithm for low-budget crowdsourcing in general.

View Conference

Cite

Export

Save

Research Organization:: University of Virginia

Sponsoring Organization:: USDOE

DOE Contract Number:: EE0008227

OSTI ID:: 1822655

Report Number(s):: DOE-UVA-0008227-5; 1718216,1553568

Journal Information:: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Conference: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021

Country of Publication:: United States

Language:: English

References (2)

Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm Dawid, A. P.; Skene, A. M. Applied Statistics, Vol. 28, Issue 1 https://doi.org/10.2307/2346806	journal	January 1979
LabelMe: A Database and Web-Based Tool for Image Annotation Russell, Bryan C.; Torralba, Antonio; Murphy, Kevin P. International Journal of Computer Vision, Vol. 77, Issue 1-3 https://doi.org/10.1007/s11263-007-0090-8	journal	October 2007

Similar Records

Learning from Crowds by Modeling Common Confusions

Conference · Tue Feb 09 00:00:00 EST 2021 · OSTI ID:1822655

Wang, Hongning; Chu, Zhendong; Ma, Jing

Learning from crowds with variational Gaussian processes

Journal Article · Tue Nov 20 00:00:00 EST 2018 · Pattern Recognition · OSTI ID:1822655

Ruiz, Pablo; Morales-Alvarez, Pablo; Molina, Rafael; +1 more

SeismoGen: Seismic Waveform Synthesis Using GAN With Application to Seismic Data Augmentation

Journal Article · Fri Apr 16 00:00:00 EDT 2021 · Journal of Geophysical Research. Solid Earth · OSTI ID:1822655

Wang, Tiantong; Trugman, Daniel; Lin, Youzuo

Related Subjects

96 KNOWLEDGE MANAGEMENT AND PRESERVATION
Crowdsourcing
generative adversarial nets
label noise

Title: Improve Learning from Crowds via Generative Augmentation

Citation Formats

References (2)

Similar Records

Related Subjects