Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Logical Structure Detection for Heterogeneous Document Classes
 

Summary: Logical Structure Detection for
Heterogeneous Document Classes
Leon Todoran a Marco Aiello a,b Christof Monz b
Marcel Worring a
a Intelligent Sensory Information Systems, Univ. of Amsterdam, The Netherlands
b Institute for Logic, Language and Computation, Univ. of Amsterdam, The Netherlands
ABSTRACT
We present a fully implemented system based on generic document knowledge for detecting the logical structure of
documents for which only general layout information is assumed. In particular, we focus on detecting the reading
order. Our system integrates components based on computer vision, arti cial intelligence, and natural language
processing techniques. The prominent feature of our framework is its ability to handle documents from heterogeneous
collections. The system has been evaluated on a standard collection of documents to measure the quality of the reading
order detection. Experimental results for each component and the system as a whole are presented and discussed
in detail. The performance of the system is promising, especially when considering the diversity of the document
collection.
Keywords: Document Analysis, Logical Structure Detection, Reading Order Detection, Natural Language Process-
ing, Spatial Reasoning.
1. INTRODUCTION
The goal of document analysis is to automatically process scanned documents and convert them into a digital
format, which can for example be further processed for reproduction, digital libraries, information retrieval, and

  

Source: Aiello, Marco - Institute for Mathematics and Computing Science, Rijksuniversiteit Groningen

 

Collections: Computer Technologies and Information Sciences