Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Asymptotic accuracy of two-class discrimination

Technical Report ·
OSTI ID:68583
;  [1]
  1. AT&T Bell Laboratories, Murray Hill, NJ (United States)
Poor quality-e.g. sparse or unrepresentative-training data is widely suspected to be one cause of disappointing accuracy of isolated-character classification in modern OCR machines. We conjecture that, for many trainable classification techniques, it is in fact the dominant factor affecting accuracy. To test this, we have carried out a study of the asymptotic accuracy of three dissimilar classifiers on a difficult two-character recognition problem. We state this problem precisely in terms of high-quality prototype images and an explicit model of the distribution of image defects. So stated, the problem can be represented as a stochastic source of an indefinitely long sequence of simulated images labeled with ground truth. Using this sequence, we were able to train all three classifiers to high and statistically indistinguishable asymptotic accuracies (99.9%). This result suggests that the quality of training data was the dominant factor affecting accuracy. The speed of convergence during training, as well as time/space trade-offs during recognition, differed among the classifiers.
Research Organization:
Nevada Univ., Las Vegas, NV (United States)
OSTI ID:
68583
Report Number(s):
CONF-9404212--
Country of Publication:
United States
Language:
English

Similar Records

Prediction of OCR accuracy using simple image features
Technical Report · Fri Mar 31 23:00:00 EST 1995 · OSTI ID:46719

An evaluation of information retrieval accuracy with simulated OCR output
Technical Report · Fri Dec 30 23:00:00 EST 1994 · OSTI ID:68569

Adaptive image enhancement of text images that contain touching or broken characters
Technical Report · Mon Nov 28 23:00:00 EST 1994 · OSTI ID:42491