| | |
Summary: An Evaluation of Linguisticallymotivated Indexing Schemes
Avi Arampatzis Th.P. van der Weide C.H.A. Koster P. van Bommel
Technical Report CSIR9927, December 1999, Dept. of Information Systems and Information Retrieval,
University of Nijmegen, The Netherlands.
favgerino,tvdw,kees,pvbg@cs.kun.nl
Submitted to BCSIRSG 2000
December 14, 1999
Abstract
In this article, we describe a number of indexing experiments based on indexing terms other than simple keywords.
These experiments were conducted as one step in validating a linguisticallymotivated indexing model. The problem
is important but not new. What is new in this approach is the variety of schemes evaluated. It is important since it
should not only help to overcome the wellknown problems of bagofwords representations, but also the difficulties
raised by nonlinguistic text simplification techniques such as stemming, stopword deletion, and term selection. Our
approach in the selection of terms is based on partofspeech tagging and shallow parsing. The indexing schemes
evaluated vary from simple keywords to nouns, verbs, adverbs, adjectives, adjacent wordpairs, and headmodifier
pairs. Our findings apply to Information Retrieval and most of related areas.
1 Introduction
The purpose of an automated information seeking system is to process information sources, and provide users with the
information they need. The particular nature of an information seeking process is determined by the characteristics
of information needs and information sources, such as the change rate. For instance, Information Retrieval assumes a
|