Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Information Retrieval and OCR: From Converting Content to Grasping Meaning

Summary: Information Retrieval and OCR:
From Converting Content to Grasping Meaning
Jamie Callan Paul Kantor David Grossman
School of Computer Science School of Communication, Computer Science Dept
Carnegie Mellon University Information and LibraryStudies Illinois Institute of Technology
Pittsburgh, PA 15241, USA Rutgers University Chicago, IL 60616, USA
New Brunswick, NJ 08901, USA
callan@cmu.edu kantor@scils.rutgers.edu grossman@iit.edu
IR and OCR have largely developed independent standards and metrics, with OCR focused on literal
accuracy, and IR focused on essential "content/meaning". With more and more media not only paper, but in
multiple image formats, the opportunities and challenges for OCR on new formats video and still images
are enormous. While OCR is assessed in metrics that emphasize words and characters, IR has learned to
apply end-to-end metrics that ask whether the needs of the users can be met by existing systems. The same
considerations apply also to the problem of providing permanent worldwide access to millions of pages of
legacy print documents, representing the shared human record as it existed until just a few years ago.
The International Society for Optical Engineering (SPIE) has held a series of Document Recognition and
Retrieval (DRR) conferences. The tenth, DRR X will be held in January 2003, in Santa Clara California. In
2001, Dan LoPresti of Bell Labs decided that the area would benefit from more intense collaboration
between those who specialize in finding the words on a page image, and those researchers who know how to


Source: Argamon, Shlomo - Department of Computer Science, Illinois Institute of Technology
Rutgers University, Rutgers Center for Operations Research


Collections: Computer Technologies and Information Sciences; Engineering