Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Understanding the Yarowsky Steven Abney
 

Summary: Understanding the Yarowsky
Algorithm
Steven Abney
University of Michigan
Many problems in computational linguistics are well suited for bootstrapping (semi-supervised
learning) techniques. The Yarowsky algorithm is a well-known bootstrapping algorithm,
but it is not mathematically well understood. This paper analyzes it as optimizing an
objective function. More specifically, a number of variants of the Yarowsky algorithm
(though not the original algorithm itself) are shown to optimize either likelihood or a
closely related objective function K.
1. Introduction
Bootstrapping, or semi-supervised learning, has become an important topic in computa-
tional linguistics. For many language-processing tasks, there is an abundance of unlabeled
data, but labeled data is lacking and too expensive to create in large quantities, making
bootstrapping techniques desirable.
The Yarowsky algorithm (Yarowsky, 1995) was one of the first bootstrapping algo-
rithms to become widely known in computational linguistics. The Yarowsky algorithm,
in brief, consists of two loops. The "inner loop" or base learner is a supervised learning
algorithm. Specifically, Yarowsky uses a simple decision list learner that considers rules
of the form, "If instance x contains feature f, then predict label j," and selects those

  

Source: Abney, Steven P. - School of Information, University of Michigan

 

Collections: Computer Technologies and Information Sciences