glemöhnic
Other conference contribution, 2024
This piece explores how extra-normal [undefined] vocal sounds trigger and provoke nonsensical AI-mediations of human vocality, by utilising non-text audio as input material for text-expectant AI speech recognition and synthesis models. The mediation of non-textual human voice gestures by these ASR models yields eclectic, bizarre and poetic nonsense, which is further utilised as textual input for text-to-speech synthesis and voice cloning models. CoquiTTS’ XTTS_V2 model re-constructs the syllabic, phonemic and garbled poems into vocal clones that oscillate between their reference audio (the original audio dataset input) and the scraped audio data that the XTTS_V2 model has been trained on. The result of this is a collection of original and cloned audio samples that are utilised as sonic material in a live coded musical performance, using the strudelREPL platform.
research-through-design
AI
voice synthesis
musical AI
speech recognition
Author
Kelsey Cotton
Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI
International Conference on AI and Musical Creativity
Oxford, United Kingdom,
Subject Categories (SSIF 2011)
Performing Arts
Music
Computer Science
Infrastructure
C3SE (Chalmers Centre for Computational Science and Engineering)