Asymptotic accuracy of two-class discrimination
Technical Report
·
OSTI ID:68583
- AT&T Bell Laboratories, Murray Hill, NJ (United States)
Poor quality-e.g. sparse or unrepresentative-training data is widely suspected to be one cause of disappointing accuracy of isolated-character classification in modern OCR machines. We conjecture that, for many trainable classification techniques, it is in fact the dominant factor affecting accuracy. To test this, we have carried out a study of the asymptotic accuracy of three dissimilar classifiers on a difficult two-character recognition problem. We state this problem precisely in terms of high-quality prototype images and an explicit model of the distribution of image defects. So stated, the problem can be represented as a stochastic source of an indefinitely long sequence of simulated images labeled with ground truth. Using this sequence, we were able to train all three classifiers to high and statistically indistinguishable asymptotic accuracies (99.9%). This result suggests that the quality of training data was the dominant factor affecting accuracy. The speed of convergence during training, as well as time/space trade-offs during recognition, differed among the classifiers.
- Research Organization:
- Nevada Univ., Las Vegas, NV (United States)
- OSTI ID:
- 68583
- Report Number(s):
- CONF-9404212--
- Country of Publication:
- United States
- Language:
- English
Similar Records
Prediction of OCR accuracy using simple image features
An evaluation of information retrieval accuracy with simulated OCR output
Adaptive image enhancement of text images that contain touching or broken characters
Technical Report
·
Fri Mar 31 23:00:00 EST 1995
·
OSTI ID:46719
An evaluation of information retrieval accuracy with simulated OCR output
Technical Report
·
Fri Dec 30 23:00:00 EST 1994
·
OSTI ID:68569
Adaptive image enhancement of text images that contain touching or broken characters
Technical Report
·
Mon Nov 28 23:00:00 EST 1994
·
OSTI ID:42491