Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Combining Linguistic and Spatial Information for Document Analysis

Summary: Combining Linguistic and Spatial Information
for Document Analysis
Marco Aiello1,2, Christof Monz1 and Leon Todoran2
1Institute for Logic, Language and Computation (ILLC)
2Intelligent Sensory and Information Systems (ISIS)
University of Amsterdam, The Netherlands
E-mail: {aiellom,christof,todoran}@wins.uva.nl
We present a framework to analyze color documents of complex layout. In addition, no assumption is
made on the layout. Our framework combines in a content-driven bottom-up approach two different
sources of information: textual and spatial. To analyze the text, shallow natural language processing
tools, such as taggers and partial parsers, are used. To infer relations of the logical layout we resort
to a qualitative spatial calculus closely related to Allen's calculus. We evaluate the system against
documents from a color journal and present the results of extracting the reading order from the jour-
nal's pages. In this case, our analysis is successful as it extracts the intended reading order from the
The idea behind the first attempts to create digital libraries from paper documents, was to archive directly
the scanned documents. Besides the problem of huge memory requirements, the process of retrieval
was difficult and unsatisfactory, especially because the indexing method was based on the file name


Source: Aiello, Marco - Institute for Mathematics and Computing Science, Rijksuniversiteit Groningen


Collections: Computer Technologies and Information Sciences