Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

Sejnowski, T J; Goldstein, M

Title: Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

Technical Report · Mon Jan 01 00:00:00 EST 1990

OSTI ID:6095406

Sejnowski, T J; Goldstein, M

This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Johns Hopkins Univ., Baltimore, MD (USA)

OSTI ID:: 6095406

Report Number(s):: AD-A-226968/6/XAB; CNN: AFOSR-86-0246

Country of Publication:: United States

Language:: English

Similar Records

Massively-parallel architectures for automatic recognition of visual speech signals. Annual report

Technical Report · Wed Oct 12 00:00:00 EDT 1988 · OSTI ID:6095406

Sejnowski, T J

Adding articulatory features to acoustic features for automatic speech recognition

Journal Article · Mon May 01 00:00:00 EDT 1995 · Journal of the Acoustical Society of America · OSTI ID:6095406

Zlokarnik, I

Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

Technical Report · Tue Nov 05 00:00:00 EST 1996 · OSTI ID:6095406

Hogden, J

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
PARALLEL PROCESSING
SPEECH SYNTHESIZERS
ARRAY PROCESSORS
AUGMENTATION
CLASSIFICATION
COMPUTER ARCHITECTURE
COMPUTER NETWORKS
DATA PROCESSING
IMAGE PROCESSING
INFORMATION SYSTEMS
NEURAL NETWORKS
PATTERN RECOGNITION
PROGRESS REPORT
SPEECH
TECHNOLOGY ASSESSMENT
VIDEO TAPES
DOCUMENT TYPES
ELECTRONIC EQUIPMENT
EQUIPMENT
MAGNETIC STORAGE DEVICES
MAGNETIC TAPES
MEMORY DEVICES
PROCESSING
PROGRAMMING
990200* - Mathematics & Computers

Title: Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

Citation Formats

Similar Records

Related Subjects