Coding Speech for Packet Networks
Doktorsavhandling, 2003

The topic of this thesis is speech coding for packet networks. Problems related to the use of packet networks for voice communication are addressed. Real-time voice communication is for example very delay sensitive; if the total end-to-end delay in a telephone session grows large, it is perceived as annoying. The Internet, as of today, is a "best-effort" network and, in contrast to a traditional telephone channel, varying delays may occur throughout a conversation. If packets containing speech data are delayed much, not reaching the receiving end before their scheduled playout time, they are lost. Receivers need to handle packet loss in some way, or the subjective quality will be severely degraded. The packet loss problem is central in this thesis, and it is approached from different directions. The thesis consists of seven articles (papers A-G), and in three of those (B-D), receiver-based packet loss concealment (PLC) methods are suggested. The PLC methods can in principle be employed in any existing system, by modifying the receivers. In paper E, a forward error correction system, based on the use of a secondary sub-coder, is proposed, and found to yield good results. Compared to receiver-based PLC, it does however require more bandwidth, and introduces additional delay. Instead of using PLC add-ons, as in papers B-E, the objective in paper G is to design a complete speech coder from scratch--with the packet channel in mind. A problem with many of today's coders is that they, for compression efficiency, utilize inter-frame coding techniques. Under frame-erasure conditions, such coders do not perform well, as errors propagate over several frames due to lost internal coder states. In the coder proposed in paper G, this is avoided by the use of new variable-dimension coding techniques based on Gaussian mixture (GM) models. These GM-based coding schemes are treated more generally in paper F of the thesis. Gaussian mixture modeling is frequently employed throughout the thesis (papers A,B,F,G), and is the sole topic of paper A, where a modified GM model with corresponding model estimation algorithm, is investigated.

packet loss concealment

voice over IP

speech coding

harmonic modeling

vector quantization

sinusoidal modeling

Gaussian mixture modeling

bounded support

frame erasure

variable dimension


Jonas Lindblom

Chalmers, Institutionen för elektromagnetik


Elektroteknik och elektronik



Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 2056

Technical report - School of Electrical Engineering, Chalmers University of Technology, Göteborg, Sweden: 468