glemöhnic
Övrigt konferensbidrag, 2024
This piece explores how extra-normal [undefined] vocal sounds trigger and provoke nonsensical AI-mediations of human vocality, by utilising non-text audio as input material for text-expectant AI speech recognition and synthesis models. The mediation of non-textual human voice gestures by these ASR models yields eclectic, bizarre and poetic nonsense, which is further utilised as textual input for text-to-speech synthesis and voice cloning models. CoquiTTS’ XTTS_V2 model re-constructs the syllabic, phonemic and garbled poems into vocal clones that oscillate between their reference audio (the original audio dataset input) and the scraped audio data that the XTTS_V2 model has been trained on. The result of this is a collection of original and cloned audio samples that are utilised as sonic material in a live coded musical performance, using the strudelREPL platform.
research-through-design
AI
voice synthesis
musical AI
speech recognition
Författare
Kelsey Cotton
Chalmers, Data- och informationsteknik, Data Science och AI
International Conference on AI and Musical Creativity
Oxford, United Kingdom,
Ämneskategorier (SSIF 2011)
Scenkonst
Musik
Datavetenskap (datalogi)
Infrastruktur
C3SE (Chalmers Centre for Computational Science and Engineering)