Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Validation of simulated OCR data sets

Technical Report ·
OSTI ID:68570
 [1]
  1. Rensselaer Polytechnic Institute, Troy, NY (United States)

Recent interest in synthetic data sets for improving classifier performance raises the question whether pseudo-random defect models provide a good approximation to live data from an OCR perspective. A proposal is presented to evaluate artificial data sets by comparing the confusion matrices genuerated on scanned and synthesized data by a given classifier. The proposed measure applies, in principle, to both isolated character recognition and to printed text. It is argued that the proposed method is more practical than direct comparison of synthetic data with real data.

Research Organization:
Nevada Univ., Las Vegas, NV (United States)
OSTI ID:
68570
Report Number(s):
CONF-9404212--
Country of Publication:
United States
Language:
English

Similar Records

Prediction of OCR accuracy using simple image features
Technical Report · Fri Mar 31 23:00:00 EST 1995 · OSTI ID:46719

Performance evaluation of two OCR systems
Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68585

An evaluation of information retrieval accuracy with simulated OCR output
Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68569