Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

Technical Report ·
DOI:https://doi.org/10.2172/431136· OSTI ID:431136
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation may decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.
Research Organization:
Los Alamos National Lab., NM (United States)
Sponsoring Organization:
USDOE, Washington, DC (United States)
DOE Contract Number:
W-7405-ENG-36
OSTI ID:
431136
Report Number(s):
LA-UR--96-3945; ON: DE97002800
Country of Publication:
United States
Language:
English

Similar Records

A maximum likelihood approach to estimating articulator positions from speech acoustics
Technical Report · Mon Sep 23 00:00:00 EDT 1996 · OSTI ID:451192

Speech processing using maximum likelihood continuity mapping
Patent · Fri Dec 31 23:00:00 EST 1999 · OSTI ID:872961

Speech processing using maximum likelihood continuity mapping
Patent · Tue Apr 18 00:00:00 EDT 2000 · OSTI ID:20023241