skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Optical character recognition of handwritten Arabic using hidden Markov models

Conference ·
OSTI ID:1081713

The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
Work for Others (WFO)
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1081713
Resource Relation:
Conference: SPIE Defense, Security, and Sensing, Orlando, FL, USA, 20110425, 20110429
Country of Publication:
United States
Language:
English

Similar Records

Recognition of Handwritten Arabic words using a neuro-fuzzy network
Journal Article · Thu Jun 12 00:00:00 EDT 2008 · AIP Conference Proceedings · OSTI ID:1081713

A paper form processing system with an error correcting function for reading handwritten Kanji strings
Technical Report · Sat Dec 31 00:00:00 EST 1994 · OSTI ID:1081713

Grinding Wheel Condition Monitoring with Hidden Markov Model-Based Clustering Methods
Journal Article · Sun Jan 01 00:00:00 EST 2006 · Machining Science and Technology · OSTI ID:1081713