AUDIO-VISUAL INTERACTIONS IN DYNAMIC SCENES: IMPLICATIONS FOR MULTISENSORY COMPRESSION
Paper i proceeding, 2007
New media technologies enrich human capabilities for generation, transmission and representation of audio-visual content. Knowledge about human sensory and cognitive processing is critical to advance in these technologies. Traditionally, media technologies have adopted a unimodal view on human perception using, for example, separate compression modules for audio and visual data streams. Drawing on the neuroscience advances that have revealed the strongly multisensory nature of human perception, we suggest that audio-visual content processing might benefit from adopting a multimodal approach that is tailored to the rules of human multisensory processing. While visual dominance in spatial domain is largely known for static scenes, such as in the famous “ventriloquism effect”, more complex interactions emerge for dynamic audio-visual stimuli. In this paper we first review some studies on “dynamic ventriloquism” effect where visual motion captures the perceived direction of auditory motion. Second, we show how rhythmic sound patterns fill-in temporally degraded visual motion based on recently discovered “auditory-induced visual flash illusion”. Finally, we discuss the implications of these findings for multisensory processing and compression techniques.