Coding Speech for Packet Networks
Doctoral thesis, 2003
The topic of this thesis is speech coding for packet networks. Problems related to the use of packet networks for voice communication are addressed. Real-time voice communication is for example very delay sensitive; if the total end-to-end delay in a telephone session grows large, it is perceived as annoying. The Internet, as of today, is a "best-effort" network and, in contrast to a traditional telephone channel, varying delays may occur throughout a conversation. If packets containing speech data are delayed much, not reaching the receiving end before their scheduled playout time, they are lost. Receivers need to handle packet loss in some way, or the subjective quality will be severely degraded.
The packet loss problem is central in this thesis, and it is approached from different directions. The thesis consists of seven articles (papers A-G), and in three of those (B-D), receiver-based packet loss concealment (PLC) methods are suggested. The PLC methods can in principle be employed in any existing system, by modifying the receivers. In paper E, a forward error correction system, based on the use of a secondary sub-coder, is proposed, and found to yield good results. Compared to receiver-based PLC, it does however require more bandwidth, and introduces additional delay. Instead of using PLC add-ons, as in papers B-E, the objective in paper G is to design a complete speech coder from scratch--with the packet channel in mind. A problem with many of today's coders is that they, for compression efficiency, utilize inter-frame coding techniques. Under frame-erasure conditions, such coders do not perform well, as errors propagate over several frames due to lost internal coder states. In the coder proposed in paper G, this is avoided by the use of new variable-dimension coding techniques based on Gaussian mixture (GM) models. These GM-based coding schemes are treated more generally in paper F of the thesis. Gaussian mixture modeling is frequently employed throughout the thesis (papers A,B,F,G), and is the sole topic of paper A, where a modified GM model with corresponding model estimation algorithm, is investigated.
packet loss concealment
voice over IP
speech coding
harmonic modeling
vector quantization
sinusoidal modeling
Gaussian mixture modeling
bounded support
frame erasure
variable dimension