glemöhnic
Övrigt konferensbidrag, 2024
This piece explores how extra-normal [undefined] vocal sounds trigger and provoke nonsensical AI-mediations of human vocality, by utilising non-text audio as input material for text-expectant AI speech recognition and synthesis models. The mediation of non-textual human voice gestures by these ASR models yields eclectic, bizarre and poetic nonsense, which is further utilised as textual input for text-to-speech synthesis and voice cloning models. CoquiTTS’ XTTS_V2 model re-constructs the syllabic, phonemic and garbled poems into vocal clones that oscillate between their reference audio (the original audio dataset input) and the scraped audio data that the XTTS_V2 model has been trained on. The result of this is a collection of original and cloned audio samples that are utilised as sonic material in a live coded musical performance, using the strudelREPL platform.
research-through-design
AI
voice synthesis
musical AI
speech recognition
Författare
Kelsey Cotton
Chalmers, Data- och informationsteknik, Data Science och AI
International Conference on AI and Musical Creativity
Oxford, United Kingdom,
Ämneskategorier
Scenkonst
Musik
Datavetenskap (datalogi)
Infrastruktur
C3SE (Chalmers Centre for Computational Science and Engineering)