Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Understanding the Yarowsky Steven Abney

Summary: Understanding the Yarowsky
Steven Abney
University of Michigan
Many problems in computational linguistics are well suited for bootstrapping (semi-supervised
learning) techniques. The Yarowsky algorithm is a well-known bootstrapping algorithm,
but it is not mathematically well understood. This paper analyzes it as optimizing an
objective function. More specifically, a number of variants of the Yarowsky algorithm
(though not the original algorithm itself) are shown to optimize either likelihood or a
closely related objective function K.
1. Introduction
Bootstrapping, or semi-supervised learning, has become an important topic in computa-
tional linguistics. For many language-processing tasks, there is an abundance of unlabeled
data, but labeled data is lacking and too expensive to create in large quantities, making
bootstrapping techniques desirable.
The Yarowsky algorithm (Yarowsky, 1995) was one of the first bootstrapping algo-
rithms to become widely known in computational linguistics. The Yarowsky algorithm,
in brief, consists of two loops. The "inner loop" or base learner is a supervised learning
algorithm. Specifically, Yarowsky uses a simple decision list learner that considers rules
of the form, "If instance x contains feature f, then predict label j," and selects those


Source: Abney, Steven P. - School of Information, University of Michigan


Collections: Computer Technologies and Information Sciences