| | |
Summary: Multi-Align: Combining Linguistic and
Statistical Techniques to Improve Alignments
for Adaptable MT
Necip Fazil Ayan, Bonnie J. Dorr, Nizar Habash
Institute for Advanced Computer Studies Department of Computer Science
University of Maryland Columbia University
College Park, MD 20742 New York, NY 10027
{nfa,bonnie,habash}@umiacs.umd.edu habash@cs.columbia.edu
Abstract. An adaptable statistical or hybrid MT system relies heav-
ily on the quality of word-level alignments of real-world data. Statisti-
cal alignment approaches provide a reasonable initial estimate for word
alignment. However, they cannot handle certain types of linguistic phe-
nomena such as long-distance dependencies and structural differences
between languages. We address this issue in Multi-Align, a new frame-
work for incremental testing of different alignment algorithms and their
combinations. Our design allows users to tune their systems to the prop-
erties of a particular genre/domain while still benefiting from general
linguistic knowledge associated with a language pair. We demonstrate
that a combination of statistical and linguistically-informed alignments
can resolve translation divergences during the alignment process.
|