Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
A Realistic Dataset for Performance Evaluation of Document Layout Analysis A. Antonacopoulos, D. Bridson, C. Papadopoulos and S. Pletschacher
 

Summary: A Realistic Dataset for Performance Evaluation of Document Layout Analysis
A. Antonacopoulos, D. Bridson, C. Papadopoulos and S. Pletschacher
Pattern Recognition and Image Analysis (PRImA) Research Lab
School of Computing, Science and Engineering, University of Salford, Greater Manchester, United Kingdom
http://www.primaresearch.org
Abstract
There is a significant need for a realistic dataset on
which to evaluate layout analysis methods and examine
their performance in detail. This paper presents a new
dataset (and the methodology used to create it) based on a
wide range of contemporary documents. Strong emphasis
is placed on comprehensive and detailed representation of
both complex and simple layouts, and on colour originals.
In-depth information is recorded both at the page and re-
gion level. Ground truth is efficiently created using a new
semi-automated tool and stored in a new comprehensive
XML representation, the PAGE format. The dataset can
be browsed and searched via a web-based front end to the
underlying database and suitable subsets (relevant to spe-
cific evaluation goals) can be selected and downloaded.

  

Source: Antonacopoulos, Apostolos - School of Computing, Science and Engineering, University of Salford

 

Collections: Computer Technologies and Information Sciences