Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
A Survey of Retrieval Strategies for OCR Text Collections
 

Summary: A Survey of Retrieval Strategies for
OCR Text Collections
Steven M. Beitzel, Eric C. Jensen, David A. Grossman
Information Retrieval Laboratory
Department of Computer Science
Illinois Institute of Technology
{steve,ej,grossman}@ir.iit.edu
Abstract
The importance of effectively retrieving OCR text has grown significantly in recent
years. We provide a brief overview of work done to improve the effectiveness of
retrieval of OCR text.
Introduction
As electronic media becomes more and more prevalent, the need for transferring older
documents to the electronic domain grows. Optical Character Recognition (OCR) works
by scanning source documents and performing character analysis on the resulting images,
giving a translation to ASCII text, which can then be stored and manipulated
electronically like any standard electronic document. Unfortunately, the character
recognition process is not perfect, and errors often occur. These errors have an adverse
effect on the effectiveness of information retrieval algorithms that are based on exact
matches of query terms and document terms. Searching OCR data is essentially a search

  

Source: Argamon, Shlomo - Department of Computer Science, Illinois Institute of Technology

 

Collections: Computer Technologies and Information Sciences