skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Script and language determination from document images

Abstract

We have developed techniques for distinguishing which language is represented in an image of text. This work is restricted to a small but important subset of the world`s languages, using techniques that should be applicable across much more comprehensive samples. The method first classifies the script into two broad classes: European and Asian. This classification is based on the spatial relationships of fiducial points related to the upward concavities in character structures. Language identification within the Asian script class (Japanese, Chinese, Korean) is performed by analysis of the optical density distribution of the text images. Within the European script class, language identification is described in separate papers.

Authors:
 [1]
  1. Fuji Xerox Palo Alto Laboratory, Palo Alto, CA (United States)
Publication Date:
Research Org.:
Nevada Univ., Las Vegas, NV (United States)
OSTI Identifier:
68579
Report Number(s):
CONF-9404212-
TRN: 95:004349-0019
Resource Type:
Technical Report
Resource Relation:
Conference: 3. annual symposium on document analysis and information retrieval, Las Vegas, NV (United States), 11-13 Apr 1994; Other Information: PBD: 1994; Related Information: Is Part Of Third Annual Symposium on Document Analysis and Information Retrieval; PB: 484 p.
Country of Publication:
United States
Language:
English
Subject:
99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; MACHINE TRANSLATIONS; ACCURACY; OPTICAL SCANNERS; SPATIAL DISTRIBUTION; CLASSIFICATION; INFORMATION; IMAGES

Citation Formats

Spitz, A L. Script and language determination from document images. United States: N. p., 1994. Web.
Spitz, A L. Script and language determination from document images. United States.
Spitz, A L. 1994. "Script and language determination from document images". United States.
@article{osti_68579,
title = {Script and language determination from document images},
author = {Spitz, A L},
abstractNote = {We have developed techniques for distinguishing which language is represented in an image of text. This work is restricted to a small but important subset of the world`s languages, using techniques that should be applicable across much more comprehensive samples. The method first classifies the script into two broad classes: European and Asian. This classification is based on the spatial relationships of fiducial points related to the upward concavities in character structures. Language identification within the Asian script class (Japanese, Chinese, Korean) is performed by analysis of the optical density distribution of the text images. Within the European script class, language identification is described in separate papers.},
doi = {},
url = {https://www.osti.gov/biblio/68579}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Sat Dec 31 00:00:00 EST 1994},
month = {Sat Dec 31 00:00:00 EST 1994}
}

Technical Report:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that may hold this item. Keep in mind that many technical reports are not cataloged in WorldCat.

Save / Share: