An evaluation of information retrieval accuracy with simulated OCR output
Technical Report
·
OSTI ID:68569
- Univ. of Massachusetts, Amherst, MA (United States)
- Information Science Research Institute, Univ. of Nevada, Las Vegas, NV (United States)
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. Because of the difficulty of constructing test collections to obtain this data, we have carried out evaluation using simulated OCR output on a variety of databases. The results show that high quality OCR devices have little effect on the accuracy of retrieval, but low quality devices used with databases of short documents can result in significant degradation.
- Research Organization:
- Nevada Univ., Las Vegas, NV (United States)
- OSTI ID:
- 68569
- Report Number(s):
- CONF-9404212--
- Country of Publication:
- United States
- Language:
- English
Similar Records
Performance evaluation of two OCR systems
Prediction of OCR accuracy using simple image features
An evaluation of an automatic markup system
Technical Report
·
Fri Dec 30 23:00:00 EST 1994
·
OSTI ID:68585
Prediction of OCR accuracy using simple image features
Technical Report
·
Fri Mar 31 23:00:00 EST 1995
·
OSTI ID:46719
An evaluation of an automatic markup system
Conference
·
Fri Mar 31 23:00:00 EST 1995
·
OSTI ID:46721