Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
A Self-Adaptive Method for Extraction of Document-Specific Alphabets Stefan Pletschacher
 

Summary: A Self-Adaptive Method for Extraction of Document-Specific Alphabets
Stefan Pletschacher
Pattern Recognition and Image Analysis (PRImA) Research Lab
School of Computing, Science and Engineering, University of Salford, Greater Manchester,
United Kingdom
s.pletschacher@primaresearch.org
Abstract
Recognition and encoding of digitized historical
documents is still a challenging and difficult task. A
major problem is the occurrence of unknown glyphs
and symbols which might not even exist in modern
alphabets. Current pre-trained OCR-methods hardly
deliver usable results for such documents. This paper
describes an alternative approach and framework for
handling printed historical documents without
restrictions on the contained alphabets or fonts. The
basic idea is to derive all information required for
encoding directly from the document itself. This is
achieved by extracting a document-specific prototype
alphabet of locatable glyphs. Core of the system is a

  

Source: Antonacopoulos, Apostolos - School of Computing, Science and Engineering, University of Salford

 

Collections: Computer Technologies and Information Sciences