Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

A Survey of Retrieval Strategies for OCR Text Collections

Summary: A Survey of Retrieval Strategies for
OCR Text Collections
Steven M. Beitzel, Eric C. Jensen, David A. Grossman
Information Retrieval Laboratory
Department of Computer Science
Illinois Institute of Technology
The importance of effectively retrieving OCR text has grown significantly in recent
years. We provide a brief overview of work done to improve the effectiveness of
retrieval of OCR text.
As electronic media becomes more and more prevalent, the need for transferring older
documents to the electronic domain grows. Optical Character Recognition (OCR) works
by scanning source documents and performing character analysis on the resulting images,
giving a translation to ASCII text, which can then be stored and manipulated
electronically like any standard electronic document. Unfortunately, the character
recognition process is not perfect, and errors often occur. These errors have an adverse
effect on the effectiveness of information retrieval algorithms that are based on exact
matches of query terms and document terms. Searching OCR data is essentially a search


Source: Argamon, Shlomo - Department of Computer Science, Illinois Institute of Technology


Collections: Computer Technologies and Information Sciences