Validation of document image defect models for optical character recognition

Li, Y; Lopresti, D; Tomkins, A

Validation of document image defect models for optical character recognition

Technical Report · Fri Dec 30 23:00:00 EST 1994

OSTI ID:68571

Li, Y; Lopresti, D; Tomkins, A ^[1]

Panasonic Technologies, Inc., Princeton, NJ (United States)

In this paper we consider the problem of evaluating models for physical defects affecting the optical character recognition (OCR) process. While a number of such models have been proposed, the contention that they produce the desired result is typically argued in an ad hoc and informal way. We introduce a rigorous and more pragmatic definition of when a model is accurate: we say a defect model is validated if the OCR errors induced by the model are effectively indistinguishable from the errors encountered when using real scanned documents. We present two measures to quantify this similarity: the Vector Space method and the Coin Bias method. The former adapts an approach used in information retrieval, the latter simulates an observer attempting to do better than a {open_quotes}random{close_quotes} guesser. We compare and contrast the two techniques based on experimental data; both seem to work well, suggesting this is an appropriate formalism for the development and evaluation of document image defect models.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Nevada Univ., Las Vegas, NV (United States)

OSTI ID:: 68571

Report Number(s):: CONF-9404212--

Country of Publication:: United States

Language:: English

Similar Records

An evaluation of information retrieval accuracy with simulated OCR output

Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68569

An evaluation of an automatic markup system

Conference · Fri Mar 31 23:00:00 EST 1995 · OSTI ID:46721

Low-level structural recognition of documents

Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68590

Related Subjects

99 GENERAL AND MISCELLANEOUS
ERRORS
IMAGE SCANNERS
PATTERN RECOGNITION
PERFORMANCE
VECTOR PROCESSING

Validation of document image defect models for optical character recognition

Citation Formats

Similar Records

Related Subjects