Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report
This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.
- Research Organization:
- Johns Hopkins Univ., Baltimore, MD (USA)
- OSTI ID:
- 6095406
- Report Number(s):
- AD-A-226968/6/XAB; CNN: AFOSR-86-0246
- Country of Publication:
- United States
- Language:
- English
Similar Records
Adding articulatory features to acoustic features for automatic speech recognition
Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding
Related Subjects
PARALLEL PROCESSING
SPEECH SYNTHESIZERS
ARRAY PROCESSORS
AUGMENTATION
CLASSIFICATION
COMPUTER ARCHITECTURE
COMPUTER NETWORKS
DATA PROCESSING
IMAGE PROCESSING
INFORMATION SYSTEMS
NEURAL NETWORKS
PATTERN RECOGNITION
PROGRESS REPORT
SPEECH
TECHNOLOGY ASSESSMENT
VIDEO TAPES
DOCUMENT TYPES
ELECTRONIC EQUIPMENT
EQUIPMENT
MAGNETIC STORAGE DEVICES
MAGNETIC TAPES
MEMORY DEVICES
PROCESSING
PROGRAMMING
990200* - Mathematics & Computers