| | |
Summary: ON THE SCALABILITY OF THE ANSWER EXTRACTION
SYSTEM ``EXTRANS''
Diego Moll’ a Michael Hess
There have been many attempts in the history of Information Retrieval (IR) to add some linguistic
capabilites to standard IR systems in order to improve their performance (mainly, their precision). 1
These attempts have not been very successful so far, at least not in the standard IR settings (cf. [7]).
The two main reasons are the (related but not identical) problems of data volume and of scalability.
First, the volume of data typically processed by IR systems is so large that the use of more than a few
isolated linguistic components seemed out of the question, and linguistic components do not work
well in isolation. Second, NLP systems that work reasonably well in small scale laboratory contexts
will often not scale up to real world domains like those for which IR is standardly used. Both of
these points seem to all but rule out the use of fullfledged NLP methods in standard text retrieval
applications.
For some specific applications, however, high recall and precision are even more crucial than in IR yet
the volumes of data to process are much smaller. These applications include interfaces to machine
readable technical manuals, online help systems for complex software (such as operating systems),
help desk systems in large organisations, and public inquiry systems accessible over the Internet. In
all these applications the document collections to be accessed are just a few hundred megabytes in
size at most. The users of these applications do not want a set of complete documents, each one
possibly dozens of pages long, as in standard IR. What they want is a few highly specific answers to
|