| | |
Summary: Flexible Text Recovery from Degraded Typewritten Historical Documents
A. Antonacopoulos and C. Casado Castilla
Pattern Recognition and Image Analysis (PRImA) Research Lab
School of Computing, Science and Engineering, University of Salford
Greater Manchester, M5 4WT, United Kingdom
http://www.primaresearch.org
This work has been supported in part through the EU grant IST-2001-33441.
Abstract
The conversion of large collections of historical
typewritten documents into digital libraries and archives
is met with significant challenges that standard
recognition techniques cannot address. The condition and
individual nature of characters in these degraded
documents necessitate a departure from existing
thresholding approaches. This paper presents a flexible
approach designed to overcome the difficulties presented
by such documents by flexibly analysing each individual
character and cautiously repairing it. The main sources
of OCR errors are successfully addressed and reliable
|