Spatial Room Impulse Response Processing for Virtual Acoustics
Doktorsavhandling, 2026

Augmented reality (AR) and telepresence systems aim to enhance the real world with virtual elements that blend convincingly into the surrounding space. Creating virtual sound sources in this context requires presenting perceptually valid head-related and room-acoustic cues to the listener to enable a realistic spatial impression and a coherent match between the virtual acoustics and those of the physical environment. In practical AR systems, the acoustic characteristics of the environment must be estimated from available sensor signals and the virtual source rendered through acoustically transparent headphones to preserve natural sounds in the physical environment. This thesis addresses both stages of this virtual acoustic processing chain: estimation and rendering. Central to both are spatial room impulse responses (SRIRs), which describe the linear, time-invariant, and directional properties of the acoustic transfer path between a source and a receiver in an environment.

The thesis first introduces a general microphone array signal model that separates room- and array-dependent contributions using spherical or circular harmonic representations. Building on this model, a blind SRIR estimation framework is proposed that reformulates blind multichannel system identification as an informed problem through the estimation of a pseudo-reference signal. Motivated by practical AR systems that often rely on wearable devices such as head-mounted displays or smartglasses, the thesis then specifically considers microphone arrays in motion.

The second part of the thesis focuses on the binaural rendering of estimated SRIRs for headphone reproduction. An array-aware end-to-end magnitude least-squares renderer is proposed to mitigate spatio-spectral coloration caused by limited spatial sampling and regularization. As an alternative to direct rendering, the thesis investigates the separation of direct sound and early reflections from an SRIR, a common processing step in parametric SRIR-based rendering that can facilitate virtual acoustic reproduction with increased directional sharpness. Two approaches are compared: one based on a physical array signal model and another based on subspace decomposition.

Together, these contributions advance practical SRIR estimation and rendering for virtual acoustics and provide foundations for robust, wearable, and perceptually convincing augmented and virtual reality audio systems.

Spatial Room Impulse Response

Room Acoustics

Room Impulse Response Estimation

Virtual Acoustics

Microphone Array

Binaural Rendering

SB-H4
Opponent: Prof. Dr.-Ing. Sebastian Schlecht, Chair of Multimedia Communications and Signal Processing, Friedrich-Alexander University Erlangen-Nürnberg, Germany

Författare

Thomas Deppisch

Chalmers, Arkitektur och samhällsbyggnadsteknik, Teknisk akustik

Blind Estimation of Spatial Room Impulse Responses Using a Pseudo Reference Signal

2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings,;(2024)p. 470-474

Paper i proceeding

Blind Identification of Binaural Room Impulse Responses From Smart Glasses

IEEE/ACM Transactions on Audio Speech and Language Processing,;Vol. 32(2024)p. 4052-4065

Artikel i vetenskaplig tidskrift

Spatial Room Impulse Response Identification from Rotating Equatorial Microphone Arrays

European Signal Processing Conference,;(2024)p. 116-120

Paper i proceeding

Spatial Room Impulse Response Estimation from a Moving Microphone Array

European Signal Processing Conference,;(2025)p. 91-95

Paper i proceeding

T. Deppisch, S. V. Amengual Garí, P. Calamia and J. Ahrens, ”Identification and Matching of Room Acoustics With Moving Head-Worn Microphone Arrays,” 2026.

End-to-End Magnitude Least Squares Binaural Rendering of Spherical Microphone Array Signals

2021 Immersive and 3D Audio: From Architecture to Automotive, I3DA 2021,;(2021)

Paper i proceeding

Spatial Subtraction of Reflections from Room Impulse Responses Measured With a Spherical Microphone Array

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,;Vol. 2021-October(2021)p. 346-350

Paper i proceeding

Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses

IEEE/ACM Transactions on Audio Speech and Language Processing,;Vol. 31(2023)p. 927-942

Artikel i vetenskaplig tidskrift

When two people have a conversation in the same room, it is easy for them to tell where the other person is standing, how far away they are, and what kind of environment they are in. In a typical video call, much of this information is lost. The voice is captured, transmitted, and played back in a simplified way, so it is no longer tied to a position or a space. When listening through headphones, it is often perceived as coming from inside the head rather than from the surrounding environment.

Recreating a more natural experience requires capturing and reproducing how sound behaves in a room. This can be done using a spatial room impulse response, which can be understood as an acoustic fingerprint of a space. It describes how sound travels from a source to a listener in a specific environment, including how it reflects, reverberates, and from which directions it arrives. In other words, it captures both the temporal and spatial acoustic characteristics of a room.

When combined with models of how humans hear with two ears and played back over headphones, this makes it possible to add virtual sounds to a real environment in a convincing way. This is the basis of virtual acoustics and is important for applications such as augmented reality and telepresence.

In practical applications, it is typically not possible to perform dedicated measurements, and spatial room impulse responses must be estimated from naturally occurring sounds using small, wearable microphone arrays that may be moving. This thesis focuses on estimating spatial room impulse responses from such recordings and rendering them for headphone playback while preserving the cues needed to perceive sound direction and space in a natural way. By improving both how spatial room impulse responses are captured and reproduced, this work enables remote communication that feels more like being in the same room.

Styrkeområden

Informations- och kommunikationsteknik

Drivkrafter

Innovation och entreprenörskap

Ämneskategorier (SSIF 2025)

Signalbehandling

Reglerteknik

DOI

10.63959/chalmers.dt/5863

ISBN

978-91-8103-406-6

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5863

Utgivare

Chalmers

SB-H4

Opponent: Prof. Dr.-Ing. Sebastian Schlecht, Chair of Multimedia Communications and Signal Processing, Friedrich-Alexander University Erlangen-Nürnberg, Germany

Mer information

Senast uppdaterat

2026-05-01