Prediction of OCR accuracy using simple image features

Blando, L R; Kanai, Junichi; Nartker, T A

Prediction of OCR accuracy using simple image features

Technical Report · Fri Mar 31 23:00:00 EST 1995

OSTI ID:46719

Blando, L R; Kanai, Junichi; Nartker, T A

A classifier for predicting the character accuracy of a given page achieved by any Optical Character Recognition (OCR) system is presented. This classifier is based on measuring the amount of white speckle, the amount of character fragments, and overall size information in the page. No output from the OCR system is used. The given page is classified as either good quality (i.e., high OCR accuracy expected) or poor (i.e., low OCR accuracy expected). Six OCR systems processed two different sets of test data: a set of 439 pages obtained from technical and scientific documents and a set of 200 pages obtained from magazines. For every system, approximately 85% of the pages in each data set were correctly predicted. The performance of this classifier is also compared with the ideal-case performance of a prediction method based upon the number of reject markers in OCR generated text. In several cases, this method matched or exceeded the performance of the reject based approach.

🛈

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Research Organization:: Nevada Univ., Las Vegas, NV (United States). Information Science Research Inst.

Sponsoring Organization:: USDOE, Washington, DC (United States)

DOE Contract Number:: FC08-90NV10872

OSTI ID:: 46719

Report Number(s):: CONF-950226--34; ON: DE95009887

Country of Publication:: United States

Language:: English

Similar Records

An evaluation of information retrieval accuracy with simulated OCR output

Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68569

Performance evaluation of two OCR systems

Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68585

Validation of simulated OCR data sets

Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68570

Related Subjects

99 GENERAL AND MISCELLANEOUS
ACCURACY
ALGORITHMS
CLASSIFICATION
DATA BASE MANAGEMENT
DATA PROCESSING
EVALUATION
EXPERIMENTAL DATA
IMAGE PROCESSING
INFORMATION SYSTEMS
OPTICAL SCANNERS
PROBABILISTIC ESTIMATION

Prediction of OCR accuracy using simple image features

Citation Formats

Similar Records

Related Subjects