| | |
Summary: Improving the Accessibility of SGML Documents –
A Contentanalytical Approach \Lambda
Helena Ahonen Barbara Heikkinen
Oskari Heinonen Mika Klemettinen
--hahonen, bheikkin, oheinone, mklemett¯@cs.helsinki.ų
Department of Computer Science, University of Helsinki
P.O. Box 26, FIN–00014 University of Helsinki, Finland
Abstract
Document retrieval based on string searches typically returns either the whole
document or just the occurrences of the searched words. What the user often is after,
however, is microdocument: a part of the document that contains the occurrences
and is reasonably selfcontained.
These microdocuments might, for instance, consist of several successive text para
graphs sharing a mutual subject. Single paragraphs, or corresponding closetoleaf
Sgml elements, do not convey enough of the contextual information. On the other
hand, sections or subsections of a text document, such as a book or an article, can
discuss many heterogeneous topics, and thus be too large a unit for retrieval or
assembly.
We claim that such microdocuments are both suitable retrievable units and ap
propriate units for document assembly, and that they can be reasonably well located
|