Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Amplitude Modulation Cues for Perceptual Voicing Distinctions in Noise Brian P. Stropc and Abeer A. Alwan
 

Summary: Amplitude Modulation Cues for Perceptual Voicing Distinctions in Noise
Brian P. Stropc and Abeer A. Alwan
Speech Processing and Altditory Perception Laboratory, Department of Electrical Engineering,
School of Engineering and Applied Sciences, UCM, 405 Hilgard Ave., bs Angeles CA, 90095
Abstract: This paper describes measurements of (he perception of voicing for fricatives in noise. In addition to a low-fre-
quency spectral cue, voiced fricatives can include a temporal amplitude-modulation cue at the pitch-rate in high-frequency
regions. Syllable initial fricatives [s] and [z] were manually isolated from CV syllables, high-pass filtered above 3 kHz,
and added to flat-spectrum background noise. Subjects were able to discriminate these sounds below O-dBSNR. Discrim-
ination thresholds were analyzed using three psychoacoustic models of amplitude-modulation detection.
MOTWATION
Perceiving speech in an acoustically noisy environment requires intelligent use of redundant multi-dimensional
cues spread over wide-ranging time scales. Speech perception research has often focused on spectral cues (from
approximately 400-8000 Hz) that are available through a critical-band faltering mechanism. This approach is cur-
rently used in most automatic speech recognition (ASR) systems, together with a hierarchy of non-stationary stochas-
tic models operating at the progressively slower rates of the speech-frame, phoneme, word, phrase and even
sentence. More recently, speech perception research has also investigated the role of temporal amplitude-modulation
cues ( I) at the syllabic- or articulator-rate of approximately 2-20 Hz, with some application to ASR (2). The current
study examines the perception of pitch-rate amplitude-modulation cues associated with vocal fold vibration at
roughly 80-300 Hz. This dimension is ignored in most ASR systems and in some speech perception studies. The stri-
dent fricatives [s z] are used as a case study.

  

Source: Alwan, Abeer - Electrical Engineering Department, University of California at Los Angeles

 

Collections: Computer Technologies and Information Sciences