Hello convai,
I am using your solution in combination with a elevenlabs voice.
how ever - when I am asking a question, the answer sometimes returns as an audio which sound similar to a “slow motion voice”. the eleven labs support told me, this might be related to long answers of more than 800 characters - but I get this effect already at around 300 characters/answer. Is there a way to question the LLM and forward the answer in fragments (e.g. sentence by sentence) to elevenlabs and retrieve one audio for each sentence?
or are there other solutions to this problem?
Best regards,
MA
Hello @info_AM,
Thanks for reaching out, and sorry to hear you’re encountering this issue.
To better understand the problem, could you please share a sample audio or video recording where the issue occurs? Additionally, let us know whether this happens within the chatbox on convai.com or inside your engine integration (e.g., Unity or Unreal Engine).
If possible, please try replicating the issue on convai.com directly. That would help us determine whether the problem is with the voice generation itself or how it’s being handled in your project environment.
hello K3,
sure - I tested it with the chat on convai website and I have simmilar results.
please find the video attached (its german language - but even without language knowledge you can clearly hear the “problem”)
details on the different audio-bugs related to answer:
answer 01: the gaps between the words become longer and longer over the duration of the answer and the typical sound of the voice is distorted by those long gaps. later in this answer - last sentence - the voice it self gets slow-motion-mode like effects
answer 02: here I explicitly added the “reply as short answer” in german “kurze Antwort” to my text / question … but still even the short answer has audio issues (slow-motion-effect) in the end.
answer 03: again I added “reply as short answer” .. this time the voice has not audio bugs and sound as expected
answer 04: same as 03 regarding quality .. but the last word slows down a bit
answer 05: okay*ish
answer 06: here its a medium length of the answer and its okay but not as good as 03
answer 07: a lot of audio issues in the second half .. the voice’s tone is turning into a strange version and the gaps between the words get longer, the voice over all slower
we get simmilar output issues in UNITY integration.
Over all the answers (to us) are very short … and we didnt expect any troubles here. 11Labs supports telle me up to 800 characts should be okay … the first answer in my example has 338 characters and therefore should be no problem at all!
Please upload to google drive.
here is the link:
meanwhile elevenlabs also added some context to this issue
" Hi Markus,
Thank you for reaching out, I will be glad to look into this further for you.
This issue may likely be occurring when using extreme values for the ‘Stability’ and ‘Similarity’ setting parameters.
Keeping the ‘Stability’ value at 50% and the ‘Similarity’ value at 75% will provide the most consistent speech cadence.
If the issue persists, we also suggest keeping the ‘Style Exaggeration’ value at 0% as this feature can also cause inconsistencies.
Please let us know if you experience any further trouble after changing these setting parameters and we will be glad to look into this further for you.
Patryk
IIElevenLabs | Customer Support"
but to really challenge this problem we need to iterate with different parameteric settings, which (at the moment) is not possible for us (only with help of your team). So a direct access to set the voice parameters would be highly appritiated! Maybe even a dynamic setting based on each voice query … so when talking to a LLM we would also want to sent voice parameters - so for long answers we can set different settings than for short answers.
I m out of office now for some days.