skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: MALCOM X: Combining maximum likelihood continuity mapping with Gaussian mixture models

Abstract

GMMs are among the best speaker recognition algorithms currently available. However, the GMM`s estimate of the probability of the speech signal does not change if the authors randomly shuffle the temporal order of the feature vectors, even though the actual probability of observing the shuffled signal would be dramatically different--probably near zero. A potential way to improve the performance of GMMs is to incorporate temporal information into the estimate of the probability of the data. Doing so could improve speech recognition, speaker recognition, and potentially aid in detecting lies (abnormalities) in speech data. As described in other documents (Hogden, 1996), MALCOM is an algorithm that can be used to estimate the probability of a sequence of categorical data. MALCOM can also be applied to speech (and other real valued sequences) if windows of the speech are first categorized using a technique such as vector quantization (Gray, 1984). However, by quantizing the windows of speech, MALCOM ignores information about the within-category differences of the speech windows. Thus, MALCOM and GMMs complement each other: MALCOM is good at using sequence information whereas GMMs capture within-category differences better than the vector quantization typically used by MALCOM. An extension of MALCOM (MALCOM X) thatmore » can be used for estimating the probability of a speech sequence is described here.« less

Authors:
;
Publication Date:
Research Org.:
Los Alamos National Lab., NM (United States)
Sponsoring Org.:
USDOE, Washington, DC (United States)
OSTI Identifier:
677150
Report Number(s):
LA-UR-98-1378
ON: DE99000844; TRN: AHC29821%%285
DOE Contract Number:
W-7405-ENG-36
Resource Type:
Technical Report
Resource Relation:
Other Information: PBD: [1998]
Country of Publication:
United States
Language:
English
Subject:
99 MATHEMATICS, COMPUTERS, INFORMATION SCIENCE, MANAGEMENT, LAW, MISCELLANEOUS; SPEECH; ALGORITHMS; WAVE FORMS; PATTERN RECOGNITION; MAXIMUM-LIKELIHOOD FIT; PROBABILITY

Citation Formats

Hogden, J., and Scovel, J.C. MALCOM X: Combining maximum likelihood continuity mapping with Gaussian mixture models. United States: N. p., 1998. Web. doi:10.2172/677150.
Hogden, J., & Scovel, J.C. MALCOM X: Combining maximum likelihood continuity mapping with Gaussian mixture models. United States. doi:10.2172/677150.
Hogden, J., and Scovel, J.C. Sun . "MALCOM X: Combining maximum likelihood continuity mapping with Gaussian mixture models". United States. doi:10.2172/677150. https://www.osti.gov/servlets/purl/677150.
@article{osti_677150,
title = {MALCOM X: Combining maximum likelihood continuity mapping with Gaussian mixture models},
author = {Hogden, J. and Scovel, J.C.},
abstractNote = {GMMs are among the best speaker recognition algorithms currently available. However, the GMM`s estimate of the probability of the speech signal does not change if the authors randomly shuffle the temporal order of the feature vectors, even though the actual probability of observing the shuffled signal would be dramatically different--probably near zero. A potential way to improve the performance of GMMs is to incorporate temporal information into the estimate of the probability of the data. Doing so could improve speech recognition, speaker recognition, and potentially aid in detecting lies (abnormalities) in speech data. As described in other documents (Hogden, 1996), MALCOM is an algorithm that can be used to estimate the probability of a sequence of categorical data. MALCOM can also be applied to speech (and other real valued sequences) if windows of the speech are first categorized using a technique such as vector quantization (Gray, 1984). However, by quantizing the windows of speech, MALCOM ignores information about the within-category differences of the speech windows. Thus, MALCOM and GMMs complement each other: MALCOM is good at using sequence information whereas GMMs capture within-category differences better than the vector quantization typically used by MALCOM. An extension of MALCOM (MALCOM X) that can be used for estimating the probability of a speech sequence is described here.},
doi = {10.2172/677150},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Sun Nov 01 00:00:00 EST 1998},
month = {Sun Nov 01 00:00:00 EST 1998}
}

Technical Report:

Save / Share:
  • The author describes a novel time-series analysis technique called maximum likelihood continuity mapping (MALCOM), and focuses on one application of MALCOM: detecting fraud in medical insurance claims. Given a training data set composed of typical sequences, MALCOM creates a stochastic model of sequence generation, called a continuity map (CM). A CM maximizes the probability of sequences in the training set given the model constraints, CMs can be used to estimate the likelihood of sequences not found in the training set, enabling anomaly detection and sequence prediction--important aspects of data mining. Since MALCOM can be used on sequences of categorical datamore » (e.g., sequences of words) as well as real valued data, MALCOM is also a potential replacement for database search tools such as N-gram analysis. In a recent experiment, MALCOM was used to evaluate the likelihood of patient medical histories, where ``medical history`` is used to mean the sequence of medical procedures performed on a patient. Physicians whose patients had anomalous medical histories (according to MALCOM) were evaluated for fraud by an independent agency. Of the small sample (12 physicians) that has been evaluated, 92% have been determined fraudulent or abusive. Despite the small sample, these results are encouraging.« less
  • UHMLE is a FORTRAN program to compute maximum likelihood estimates for the parameters (means, covariances, proportions) in a mixture of M multivariate (N-dimensional) normal density functions, given a sample of observation vectors. 1 figure.
  • The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
  • Two major issues associated with model validation are addressed here. First, we present a maximum likelihood approach to define and evaluate a model validation metric. The advantage of this approach is it is more easily applied to nonlinear problems than the methods presented earlier by Hills and Trucano (1999, 2001); the method is based on optimization for which software packages are readily available; and the method can more easily be extended to handle measurement uncertainty and prediction uncertainty with different probability structures. Several examples are presented utilizing this metric. We show conditions under which this approach reduces to the approachmore » developed previously by Hills and Trucano (2001). Secondly, we expand our earlier discussions (Hills and Trucano, 1999, 2001) on the impact of multivariate correlation and the effect of this on model validation metrics. We show that ignoring correlation in multivariate data can lead to misleading results, such as rejecting a good model when sufficient evidence to do so is not available.« less
  • Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.