Summary: IRENA: Information Retrieval Engine based on
Natural language Analysis
A.T. Arampatzis, C.H.A. Koster, T. Tsoris
University of Nijmegen, CSI, Postbus 9010, 6500 GL Nijmegen, The Netherlands.
Tel: +31 24 3653147, Fax: +31 24 3553450
The experimental IRENA system was developed to study the improvement of precision
and recall in document retrieval systems by means of Natural Language Processing (NLP)
techniques. The NLP component deals with the morphological, lexical and syntactical part of
the English language. For the purpose of syntactical analysis of both queries and documents,
the power of the AGFL formalism was explored in describing and developing a syntactical
analyzer for the English noun phrase with a large lexicon. The noun phrase co-occurrence
hypothesis was formulated and tested as a new relevance criterion in achieving high levels
of precision. Furthermore, the problem of calculating recall in non-indexed collections was
partially solved by introducing a new measure, relative recall.
The system was tested on a small corpus of English language documents concerned with
pop music. The results of this experiment are reported, and some conclusions drawn on the
viability of the techniques.
In the approach taken, all linguistic knowledge is encapsulated in the grammar and the