On Speech-to-Text Alignment, Phonetic Labeling, and Recursive Signal Processing
Doktorsavhandling, 1996
Speech data-bases are an important issue in the study of speech communication. In particular, time-aligned and phonetically labeled speech data-bases are useful during the design of applications such as speech recognizers. The time-alignment and labeling of the speech data is traditionally done manually. This manual procedure is time consuming, tedious, and subjective. Therefore, an automated labeling procedure would be advantageous. In the present thesis, various aspects of the design of a system for automatic labeling of speech data are thoroughly investigated.
An automated method for phonetic labeling of speech data is presented, a method that requires the orthographic transcription (text) of the speech sequence. The method is aided by a phonetic lexicon. The procedure handles long speech sequences as it takes advantage of first performing a coarse alignment between the speech signal and the corresponding text. The coarse alignment supplies the phonetic labeling system with short sequences of speech with its corresponding phonetic transcription as given by the phonetic lexicon. The alignment between the speech signal and the phones is found by means of a Viterbi search algorithm. The search algorithm is supported by a distance function based on a speech model that contains sets of phonetic and di-phonetic classes with associated mean vectors and covariance matrices. Furthermore, an extensive evaluation study of the automated labeling procedure is done, applying various signal processing methods, distance functions, and various speech models. Additionally, the behavior of the automated labeling procedure is illustrated to indicate future development of the procedure.
A second order recursive algorithm for adaptive signal processing is proposed. The basic algorithm is derived and analyzed for the ARX case, and then extended to instrumental variables and prediction error like algorithms. Furthermore, a similar algorithm is derived for signal subspace tracking. It is shown that the algorithm encompasses both the RLS and the LMS algorithms as special cases. The computational complexity is the same as for the RLS algorithm, but some extra memory storage is required. The associated ordinary differential equation for the ARX case algorithm is proven to be globally exponentially stable. Furthermore, it is demonstrated that the proposed algorithm has a higher ability to track time-varying systems than has the RLS-algorithm. The proposed algorithm especially handles those situations well where there is a simultaneous system change and decrease of signal power.
phonetic labeling
recursive algorithms
adaptive algorithms
speech-to-text alignment
dynamic programming
time-varying systems
subspace tracking
time-alignment
parameter estimation
time-warping