skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Binarization and multi-thresholding of document images using connectivity

Technical Report ·
OSTI ID:68580
 [1]
  1. AT&T Bell Laboratories, Murray Hill, NJ (United States)

Thresholding is a common image processing operation applied to gray-scale images to obtain binary or multi-level images. A thresholding method is described here that is global in approach, but uses a measure of local information, namely connectivity. Thresholds are found at the intensity levels that best preserve the connectivity of regions within the image. Thus, this method has advantages of both global and locally adaptive approaches. Experimental comparisons for document images show that the connectivity-preserving method improves subsequent OCR recognition rates from about 95% to 97.5% and reduces the number of binarization failures (where text is so poorly binarized as to be totally unrecognizable by a commercial OCR system) from 33% to 6% on difficult images.

Research Organization:
Nevada Univ., Las Vegas, NV (United States)
OSTI ID:
68580
Report Number(s):
CONF-9404212-; TRN: 95:004349-0020
Resource Relation:
Conference: 3. annual symposium on document analysis and information retrieval, Las Vegas, NV (United States), 11-13 Apr 1994; Other Information: PBD: 1994; Related Information: Is Part Of Third Annual Symposium on Document Analysis and Information Retrieval; PB: 484 p.
Country of Publication:
United States
Language:
English