Direct-to-Reverberant Energy Ratio Estimation and Extrapolation from Own Speech
Paper i proceeding, 2025
Accurately characterizing a user’s acoustic environment is essential for creating virtual sound sources in augmented reality that blend seamlessly into the real environment. The acoustic parameters of an environment can be calculated from a room impulse response (RIR) and the authors recently presented a method to blindly estimate RIRs from speech signals captured with a head-worn microphone array. The approach uses either speech from a distant speaker or own speech from the person wearing the array on their head. While both variants provide reliable reverberation time estimates, direct-to-reverberant energy ratio (DRR) estimates from the user's own speech deviate significantly from the expected DRR of a distant virtual source due to the higher direct sound level. This study investigates the feasibility of extrapolating DRR values from own speech to predict DRRs of distant sources. The approach relies on two acoustic assumptions: (i), the mouth-to-array transfer paths do not change significantly between users and, (ii), a homogeneous reverberant field. Our findings show that the assumptions hold above the Schröder frequency and in sufficiently reverberant conditions. Average DRR extrapolation errors are below 2 dB at mid frequencies when using mouth simulator measurements and around 3 dB with actual speech recordings.
Room Acoustics
Augmented Reality
Direct-to-Reverberant Energy Ratio
Room Impulse Response