| | |
Summary: Term Selection for Filtering based on Distribution of Terms over Time
Avi T. Arampatzis Th.P. van der Weide C.H.A. Koster P. van Bommel
Faculty of Mathematics and Computing Science, University of Nijmegen,
Toernooiveld 1, NL6525 ED Nijmegen, The Netherlands.
tel: +31 24 3653147, fax: +31 24 3553450
Email: {avgerino|tvdw|kees|pvb}@cs.kun.nl
In: Proceedings of RIAO'2000, 1214 April 2000, Paris.
Abstract
In this article we investigate the use of time distributions in retrieval tasks. Specifically, we introduce a
novel term selection method, namely Term Occurrence Uniformity (TOU), based on the hypothesis that
terms which occur uniformly in time are more valuable than others. Our empirical evaluation so far has
neither proved nor disproved this hypothesis. However, results are promising and suggest the need for a
deeper theoretical and empirical investigation. Our current concern is filtering, but this line of research
may easily be extended to other retrieval tasks which involve temporallydependent data.
1 Introduction
Information Filtering is the process of searching in large amounts of data for information which matches
a user information need. The filtering task is usually described as the inverse of the traditional retrieval
task. In retrieval, a onetime user request (called query) is matched to a static collection of information
objects. In filtering, users issue a longterm request (called profile) which is compared to a dynamic
collection, for instance, a stream of arriving information objects. Filtering may also be seen as a bi
|