Lexicon-based word recognition without word segmentation

Myers, G K; Chen, C H

Title: Lexicon-based word recognition without word segmentation

Technical Report · Sat Dec 31 00:00:00 EST 1994

OSTI ID:68574

Myers, G K; Chen, C H ^[1]

Advanced Automation Technology Center, Menlo Park, CA (United States)

We present a word recognition approach that does not rely on explicit word segmentation. It treats the character recognition output as a continuous string of characters instead of first dividing it into words before word-level contextual knowledge is applied. This technique is useful in degraded document images, in which isolation of individual words by purely image- or character-based means is difficult or unreliable. We use a hypothesis generation and verification approach, in which word identities and their positions are hypothesized based on {open_quotes}seed features{close_quotes} (character substrings) extracted from the output of the character recognizer. Verification of the hypotheses consists of comparing the characters in the hypothesized word with candidate characters near the position of the seed feature in the text, and selecting the set of consecutive word hypotheses that are the most mutually consistent. Hence, word segmentation and word recognition are effectively performed in parallel.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Nevada Univ., Las Vegas, NV (United States)

OSTI ID:: 68574

Report Number(s):: CONF-9404212-; TRN: 95:004349-0014

Resource Relation:: Conference: 3. annual symposium on document analysis and information retrieval, Las Vegas, NV (United States), 11-13 Apr 1994; Other Information: PBD: 1994; Related Information: Is Part Of Third Annual Symposium on Document Analysis and Information Retrieval; PB: 484 p.

Country of Publication:: United States

Language:: English

Similar Records

Optical character recognition of handwritten Arabic using hidden Markov models

Conference · Sat Jan 01 00:00:00 EST 2011 · OSTI ID:68574

Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.; +1 more

A paper form processing system with an error correcting function for reading handwritten Kanji strings

Technical Report · Sat Dec 31 00:00:00 EST 1994 · OSTI ID:68574

Marukawa, Katsumi; Nakashima, Kazuki; Koga, Masashi; +2 more

Modeling words with subword units in an articulatorily constrained speech recognition algorithm

Technical Report · Thu Nov 20 00:00:00 EST 1997 · OSTI ID:68574

Hogden, J

Related Subjects

99 MATHEMATICS
COMPUTERS
INFORMATION SCIENCE
MANAGEMENT
LAW
MISCELLANEOUS
OPTICAL SCANNERS
PERFORMANCE
PATTERN RECOGNITION
PROBABILITY
ACCURACY

Title: Lexicon-based word recognition without word segmentation

Citation Formats

Similar Records

Related Subjects