Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Logical Structure Detection for Heterogeneous Document Classes

Summary: Logical Structure Detection for
Heterogeneous Document Classes
Leon Todoran a Marco Aiello a,b Christof Monz b
Marcel Worring a
a Intelligent Sensory Information Systems, Univ. of Amsterdam, The Netherlands
b Institute for Logic, Language and Computation, Univ. of Amsterdam, The Netherlands
We present a fully implemented system based on generic document knowledge for detecting the logical structure of
documents for which only general layout information is assumed. In particular, we focus on detecting the reading
order. Our system integrates components based on computer vision, arti cial intelligence, and natural language
processing techniques. The prominent feature of our framework is its ability to handle documents from heterogeneous
collections. The system has been evaluated on a standard collection of documents to measure the quality of the reading
order detection. Experimental results for each component and the system as a whole are presented and discussed
in detail. The performance of the system is promising, especially when considering the diversity of the document
Keywords: Document Analysis, Logical Structure Detection, Reading Order Detection, Natural Language Process-
ing, Spatial Reasoning.
The goal of document analysis is to automatically process scanned documents and convert them into a digital
format, which can for example be further processed for reproduction, digital libraries, information retrieval, and


Source: Aiello, Marco - Institute for Mathematics and Computing Science, Rijksuniversiteit Groningen


Collections: Computer Technologies and Information Sciences