Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition
Artikel i vetenskaplig tidskrift, 2007
This paper presents the scheme and evaluation of a robust audio-visual digit-and-speaker-recognition system using lip
motion and speech biometrics. Moreover, a liveness verification barrier based on a person’s lip movement is added to the system to
guard against advanced spoofing attempts such as replayed videos. The acoustic and visual features are integrated at the feature level
and evaluated first by a Support Vector Machine for digit and speaker identification and, then, by a Gaussian Mixture Model for speaker
verification. Based on % 300 different personal identities, this paper represents, to our knowledge, the first extensive study investigating
the added value of lip motion features for speaker and speech-recognition applications. Digit recognition and person-identification and
verification experiments are conducted on the publicly available XM2VTS database showing favorable results (speaker verification is
98 percent, speaker identification is 100 percent, and digit identification is 83 percent to 100 percent).
lip motion
SVM
lip reading
normal image velocity
GMM
speaker recognition
biometrics
motion estimation
Speech recognition
normal image flow