skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Speaker recognition through NLP and CWT modeling.

Abstract

The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and the continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-basedmore » line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less

Authors:
; ;
Publication Date:
Research Org.:
Argonne National Lab., IL (US)
Sponsoring Org.:
US Department of Energy (US)
OSTI Identifier:
11824
Report Number(s):
ANL/ED/CP-99102
TRN: AH200118%%294
DOE Contract Number:  
W-31109-ENG-38
Resource Type:
Conference
Resource Relation:
Conference: 15th Annual NDIA Security Technology Symposium and Exhibition, Norfolk, VA (US), 06/14/1999--06/17/1999; Other Information: PBD: 23 Jun 1999
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; ACCURACY; ALGORITHMS; EXCITATION; PROGRAMMING; SECURITY; TRANSCRIPTION; WAVE FORMS

Citation Formats

Brown-VanHoozer, A, Kercel, S W, and Tucker, R W. Speaker recognition through NLP and CWT modeling.. United States: N. p., 1999. Web.
Brown-VanHoozer, A, Kercel, S W, & Tucker, R W. Speaker recognition through NLP and CWT modeling.. United States.
Brown-VanHoozer, A, Kercel, S W, and Tucker, R W. 1999. "Speaker recognition through NLP and CWT modeling.". United States. https://www.osti.gov/servlets/purl/11824.
@article{osti_11824,
title = {Speaker recognition through NLP and CWT modeling.},
author = {Brown-VanHoozer, A and Kercel, S W and Tucker, R W},
abstractNote = {The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and the continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.},
doi = {},
url = {https://www.osti.gov/biblio/11824}, journal = {},
number = ,
volume = ,
place = {United States},
year = {Wed Jun 23 00:00:00 EDT 1999},
month = {Wed Jun 23 00:00:00 EDT 1999}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: