AiCommentator: A Multimodal Conversational Agent for Embedded Visualization in Football Viewing
Paper in proceeding, 2024
Traditionally, sports commentators provide viewers with diverse information, encompassing in-game developments and player performances. Yet young adult football viewers increasingly use mobile devices for deeper insights during football matches. Such insights into players on the pitch and performance statistics support viewers' understanding of game stakes, creating a more engaging viewing experience. Inspired by commentators' traditional roles and to incorporate information into a single platform, we developed AiCommentator, a Multimodal Conversational Agent (MCA) for embedded visualization and conversational interactions in football broadcast video. AiCommentator integrates embedded visualization, either with an automated non-interactive or with a responsive interactive commentary mode. Our system builds upon multimodal techniques, integrating computer vision and large language models, to demonstrate ways for designing tailored, interactive sports-viewing content. AiCommentator's event system infers game states based on a multi-object tracking algorithm and computer vision backend, facilitating automated responsive commentary. We address three key topics: evaluating young adults' satisfaction and immersion across the two viewing modes, enhancing viewer understanding of in-game events and players on the pitch, and devising methods to present this information in a usable manner. In a mixed-method evaluation (n=16) of AiCommentator, we found that the participants appreciated aspects of both system modes but preferred the interactive mode, expressing a higher degree of engagement and satisfaction. Our paper reports on our development of AiCommentator and presents the results from our user study, demonstrating the promise of interactive MCA for a more engaging sports viewing experience. Systems like AiCommentator could be pivotal in transforming the interactivity and accessibility of sports content, revolutionizing how sports viewers engage with video content.
Multimodal Conversational Agent
Embedded Visualization
Interaction
Computer Vision
Deep Learning
Conversational User Interface
Multi-Object Tracking
Usability Testing
Human-Computer