Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Wide-Coverage Probabilistic Sentence Matthew W. Crocker1,2

Summary: Wide-Coverage Probabilistic Sentence
Matthew W. Crocker1,2
and Thorsten Brants1
This paper describes a fully implemented, broad-coverage model of human syntactic processing. The
model uses probabilistic parsing techniques, which combine phrase structure, lexical category, and
limited subcategory probabilities with an incremental, left-to-right "pruning" mechanism based on
cascaded Markov models. The parameters of the system are established through a uniform training
algorithm, which determines maximum-likelihood estimates from a parsed corpus. The probabilistic
parsing mechanism enables the system to achieve good accuracy on typical, "garden-variety" lan-
guage (i.e., when tested on corpora). Furthermore, the incremental probabilistic ranking of the pre-
ferred analyses during parsing also naturally explains observed human behavior for a range of
garden-path structures. We do not make strong psychological claims about the specific probabilistic
mechanism discussed here, which is limited by a number of practical considerations. Rather, we
argue incremental probabilistic parsing models are, in general, extremely well suited to explaining
this dual nature--generally good and occasionally pathological--of human linguistic performance.
KEY WORDS: probabilistic parsing; frequency; Markov models.
Theories of human sentence processing have largely been shaped by the study
of pathologies in human sentence processing. The principles and parsing


Source: Alishahi, Afra - Department of Computational Linguistics and Phonetics, Universitšt des Saarlandes


Collections: Computer Technologies and Information Sciences