On the Selection and Classification of Features for Speaker Recognition
Doctoral thesis, 2010
Speaker recognition is the
process of automatically recognizing who is spea- king based on information provided by speech signals. Speaker
recognition still has many unsolved problems; each person has a
unique voice that makes him or her different from other people, and
to recognize who is talking based only on the speech is not an easy
task. In this thesis, we propose methods to improve the
performance of the speaker recognizer and address issues related to
the performance measurement in speaker recognition systems.
This thesis consists of three main research parts focusing on the problems on feature extraction, speaker modeling and performance evaluation of speaker recognition systems.
In the feature extraction part, we focus on the extraction of phase information features and features inspired in the physiological functions of the brain in order to improve the performance of speaker recognition systems. Then, we address a information theoretical method to compute the amount of information that a feature set can contain about the speaker.
In the speaker modeling part, we develop a step descent algorithm, that can be used as an alternative to the Expectation Maximization (EM) algorithm to tackle the convergence problems. Moreover, we discuss the estimation of the speaker model parameters using Bayes estimation as a solution to improve the performance and reduce the mismatch between the training and the evaluation. We also propose a modeling approach based on discriminative weights with similar complexity as the conventional modeling technique used for speaker identification systems
In the performance evaluation part, we propose a statistical method based on the
log-likelihood from a set of speaker test samples to estimate the
probability of error when the number of available tests is limited.
feature extraction
mutual information
estimation
phase information.
Bayes procedures
speaker recognition
discriminative modeling