New Ideas for Speech Recognition and Related Technologies
Abstract
The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstrated by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speechmore »
- Authors:
- Publication Date:
- Research Org.:
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 15004194
- Report Number(s):
- UCRL-ID-120310
TRN: US201015%%398
- DOE Contract Number:
- W-7405-ENG-48
- Resource Type:
- Technical Report
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 99; ACOUSTICS; ALGORITHMS; ELECTROMAGNETIC RADIATION; ORGANS; RADAR; SCATTERING; SPEECH; VECTORS; VELOCITY
Citation Formats
Holzrichter, J F. New Ideas for Speech Recognition and Related Technologies. United States: N. p., 2002.
Web. doi:10.2172/15004194.
Holzrichter, J F. New Ideas for Speech Recognition and Related Technologies. United States. doi:10.2172/15004194.
Holzrichter, J F. Mon .
"New Ideas for Speech Recognition and Related Technologies". United States.
doi:10.2172/15004194. https://www.osti.gov/servlets/purl/15004194.
@article{osti_15004194,
title = {New Ideas for Speech Recognition and Related Technologies},
author = {Holzrichter, J F},
abstractNote = {The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstrated by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speech synthesis. I introduce some of those ideas in section IV of this document, and we describe them more completely in the document following this one, UCRL-UR-120311. For the design and operation of micro-power radars and their application to body organ motions, the reader may contact Tom McEwan directly. The capability for using EM sensors (i.e., radar units) to measure body organ motions and positions has been available for decades. Impediments to their use appear to have been size, excessive power, lack of resolution, and lack of understanding of the value of organ motion measurements, especially as applied to speech related technologies. However, with the invention of very low power, portable systems as demonstrated by McEwan at LLNL researchers have begun to think differently about practical applications of such radars. In particular, his demonstrations of heart and lung motions have opened up many new areas of application for human and animal measurements.},
doi = {10.2172/15004194},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jun 17 00:00:00 EDT 2002},
month = {Mon Jun 17 00:00:00 EDT 2002}
}
-
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore »
-
An articulatorily constrained, maximum entropy approach to speech recognition and speech coding
Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values aremore » -
Northeast Artificial Intelligence Consortium (NAIC). Volume 8. Artificial intelligence applications to speech recognition. Final report, Sep 84-Dec 89
The Northeast Artificial Intelligence Consortium (NAIC) was created by the Air Force Systems Command, Rome Air Development Center, and the Office of Scientific Research. Its purpose was to conduct pertinent research in artificial intelligence and to perform activities ancillary to this research. This report describes progress during the existence of the NAIC of the technical research tasks undertaken at the member universities. The topics covered in general are: versatile expert system for equipment maintenance, distributed AI for communications system control, automatic photointerpretation, time-oriented problem solving, speech understanding systems, knowledge based maintenance, hardware architectures for very large systems, knowledge based reasoningmore » -
Massively-parallel architectures for automatic recognition of visual speech signals. Annual report
Significant progress was made in the primary objective of estimating the acoustic characteristics of speech from the visual speech signals. Neural networks were trained on a data base of vowels. The raw images of faces, aligned and preprocessed, were used as input to these network, which were trained to estimate the corresponding envelope of the acoustic spectrum. The performance of the networks was better than trained humans and was comparable with optimized pattern classifiers. The approach avoids the problems of information loss through early categorization. The acoustic information the network extracts from the visual signal can be used to supplementmore » -
Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report
This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network tomore »