| | |
Summary: 1
Speaker Adaptation with Limited Data using
Regression-Tree based Spectral Peak Alignment
Shizhen Wang, Xiaodong Cui, Member, IEEE, and Abeer Alwan, Senior Member, IEEE
Abstract-- Spectral mismatch between training and testing ut-
terances can cause significant degradation in the performance of
automatic speech recognition (ASR) systems. Speaker adaptation
and speaker normalization techniques are usually applied to
address this issue. One way to reduce spectral mismatch is to
reshape the spectrum by aligning corresponding formant peaks.
There are various levels of mismatch in formant structures. In
this paper, regression-tree based phoneme- and state-level spec-
tral peak alignment is proposed for rapid speaker adaptation us-
ing linearization of the vocal tract length normalization (VTLN)
technique. This method is investigated in a maximum likelihood
linear regression (MLLR)-like framework, taking advantage of
both the efficiency of frequency warping (VTLN) and the relia-
bility of statistical estimations (MLLR). Two different regression
classes are investigated: one based on phonetic classes (using
combined knowledge and data-driven techniques) and the other
|