Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
ON THE SCALABILITY OF THE ANSWER EXTRACTION SYSTEM ``EXTRANS''
 

Summary: ON THE SCALABILITY OF THE ANSWER EXTRACTION
SYSTEM ``EXTRANS''
Diego Moll’ a Michael Hess
There have been many attempts in the history of Information Retrieval (IR) to add some linguistic
capabilites to standard IR systems in order to improve their performance (mainly, their precision). 1
These attempts have not been very successful so far, at least not in the standard IR settings (cf. [7]).
The two main reasons are the (related but not identical) problems of data volume and of scalability.
First, the volume of data typically processed by IR systems is so large that the use of more than a few
isolated linguistic components seemed out of the question, and linguistic components do not work
well in isolation. Second, NLP systems that work reasonably well in small scale laboratory contexts
will often not scale up to real world domains like those for which IR is standardly used. Both of
these points seem to all but rule out the use of full­fledged NLP methods in standard text retrieval
applications.
For some specific applications, however, high recall and precision are even more crucial than in IR yet
the volumes of data to process are much smaller. These applications include interfaces to machine­
readable technical manuals, on­line help systems for complex software (such as operating systems),
help desk systems in large organisations, and public inquiry systems accessible over the Internet. In
all these applications the document collections to be accessed are just a few hundred megabytes in
size at most. The users of these applications do not want a set of complete documents, each one
possibly dozens of pages long, as in standard IR. What they want is a few highly specific answers to

  

Source: Aliod, Diego Mollá - Department of Computing, Macquarie University

 

Collections: Computer Technologies and Information Sciences