Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Contributed article Soft vector quantization and the EM algorithm1

Summary: Contributed article
Soft vector quantization and the EM algorithm1
Ethem Alpaydin*
Department of Computer Engineering, Bogazic¸i University, Istanbul, Turkey
Received 27 April 1996; accepted 14 August 1997
The relation between hard c-means (HCM), fuzzy c-means (FCM), fuzzy learning vector quantization (FLVQ), soft competition scheme
(SCS) of Yair et al. (1992) and probabilistic Gaussian mixtures (GM) have been pointed out recently by Bezdek and Pal (1995). We extend
this relation to their training, showing that learning rules by these models to estimate the cluster centers can be seen as approximations to the
expectation­maximization (EM) method as applied to Gaussian mixtures. HCM and unsupervised, LVQ use 1-of-c type competition. In
FCM and FLVQ, membership is the ¹2/(m ¹ 1)th power of the distance. In SCS and GM, Gaussian function is used. If the Gaussian
membership function is used, the weighted within-groups sum of squared errors used as the fuzzy objective function corresponds to the
maximum likelihood estimate in Gaussian mixtures with equal priors and covariances. The fuzzy clustering method named fuzzy c-means
alternating optimization procedure (FCM-AO) proposed to optimize the former is then equivalent to batch EM and SCS's update rule is a
variant of the online version of EM. The advantages of the probabilistic framework are: (i) we no longer have spurious spread parameters that
needs fine tuning as m in fuzzy vector quantization or b in SCS; instead we have a variance term that has a sound interpretation and that can be
estimated from the sample; (ii) EM guarantees that the likelihood does not decrease, thus it converges to the nearest local optimum; (iii) EM
also allows us to estimate the underlying distance norm and the cluster priors which we could not with the other approaches. We compare
Gaussian mixtures trained with EM with LVQ (HCM), SCS and FLVQ on the IRIS dataset and see that it is more accurate due to its being
able to take into account the covariance information. We finally note that vector quantization is generally an intermediate step before finding


Source: Alpaydýn, Ethem - Department of Computer Engineering, Bogaziçi University


Collections: Computer Technologies and Information Sciences