Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.
- Research Organization:
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE, Washington, DC (United States)
- DOE Contract Number:
- W-7405-ENG-36
- OSTI ID:
- 431136
- Report Number(s):
- LA-UR-96-3945; ON: DE97002800; TRN: 97:001008
- Resource Relation:
- Other Information: PBD: 5 Nov 1996
- Country of Publication:
- United States
- Language:
- English
Similar Records
An articulatorily constrained, maximum entropy approach to speech recognition and speech coding
Maximum-Likelihood Continuity Mapping (MALCOM): An Alternative to HMMs