Hello convai,
I am using your solution in combination with a elevenlabs voice.
how ever - when I am asking a question, the answer sometimes returns as an audio which sound similar to a “slow motion voice”. the eleven labs support told me, this might be related to long answers of more than 800 characters - but I get this effect already at around 300 characters/answer. Is there a way to question the LLM and forward the answer in fragments (e.g. sentence by sentence) to elevenlabs and retrieve one audio for each sentence?
or are there other solutions to this problem?
Best regards,
MA
Hello @info_AM,
Thanks for reaching out, and sorry to hear you’re encountering this issue.
To better understand the problem, could you please share a sample audio or video recording where the issue occurs? Additionally, let us know whether this happens within the chatbox on convai.com or inside your engine integration (e.g., Unity or Unreal Engine).
If possible, please try replicating the issue on convai.com directly. That would help us determine whether the problem is with the voice generation itself or how it’s being handled in your project environment.
hello K3,
sure - I tested it with the chat on convai website and I have simmilar results.
please find the video attached (its german language - but even without language knowledge you can clearly hear the “problem”)
details on the different audio-bugs related to answer:
answer 01: the gaps between the words become longer and longer over the duration of the answer and the typical sound of the voice is distorted by those long gaps. later in this answer - last sentence - the voice it self gets slow-motion-mode like effects
answer 02: here I explicitly added the “reply as short answer” in german “kurze Antwort” to my text / question … but still even the short answer has audio issues (slow-motion-effect) in the end.
answer 03: again I added “reply as short answer” .. this time the voice has not audio bugs and sound as expected
answer 04: same as 03 regarding quality .. but the last word slows down a bit
answer 05: okay*ish
answer 06: here its a medium length of the answer and its okay but not as good as 03
answer 07: a lot of audio issues in the second half .. the voice’s tone is turning into a strange version and the gaps between the words get longer, the voice over all slower
we get simmilar output issues in UNITY integration.
Over all the answers (to us) are very short … and we didnt expect any troubles here. 11Labs supports telle me up to 800 characts should be okay … the first answer in my example has 338 characters and therefore should be no problem at all!
Please upload to google drive.
here is the link:
meanwhile elevenlabs also added some context to this issue
" Hi Markus,
Thank you for reaching out, I will be glad to look into this further for you.
This issue may likely be occurring when using extreme values for the ‘Stability’ and ‘Similarity’ setting parameters.
Keeping the ‘Stability’ value at 50% and the ‘Similarity’ value at 75% will provide the most consistent speech cadence.
If the issue persists, we also suggest keeping the ‘Style Exaggeration’ value at 0% as this feature can also cause inconsistencies.
Please let us know if you experience any further trouble after changing these setting parameters and we will be glad to look into this further for you.
Patryk
IIElevenLabs | Customer Support"
but to really challenge this problem we need to iterate with different parameteric settings, which (at the moment) is not possible for us (only with help of your team). So a direct access to set the voice parameters would be highly appritiated! Maybe even a dynamic setting based on each voice query … so when talking to a LLM we would also want to sent voice parameters - so for long answers we can set different settings than for short answers.
I m out of office now for some days.
Hello there,
we still get this ugly “slow-motion” audio output. Even with just three lines of text as response (tested on convai webpage with “test call” with the character)
Based on elevenlabs support feedback this should not happen and for such short audio replies it also “never” happens on elevenlabs website it self - only if the audio reply becomes much longer. So why does it happen with convai?
Hello again,
is there a possibility to debug this together. what is convai doing behind the scenes regarding voice generation? the difference between the generated voice via convai and via elevenlabs (using the exact same text) is huge and the returned audio (voice speed etc.) via convai is (in 50% of the cases) just shockingly bad while creating the same text-to-speech via elevenlabs only has 10% of these bad voice samples!?
sample via convai:
sample via elevenlabs:
the convai playback was recorded via my smartphone … while the elevenlabs version was simply downloaded … so the background noise is not the important issue … more the voice and its dynamic and speed.
the convai output sounds like slow motion and the hole answer takes 24 seconds instead of 10-12 which would be “normal talking speed”. In this case the words were just slowly spoken with long gaps in between. in other cases the words them self get streched and it sound even worse!
Best regards,
Markus
After testing a bit more with the current parameters (which I am not sure if you applied them in the past three days?)
which are:
voice model: Eleven Multilingual v2
Speed: 1.11
Stability: 0.50
Similarity: 0.80
stlye exaggeration: 0.0
speaker boost: true
I changed the speed value to 1.02 and (at least on eleven labs) this is now my prefered parameter setup for this voice:
voice model: Eleven Multilingual v2
Speed: 1.02
Stability: 0.50
Similarity: 0.80
stlye exaggeration: 0.0
speaker boost: true
Please apply this to our Character and confirm the change … since we never got feedback for the past requests to change the parameters. The manual parameter setup is a highly demanded feature I guess …
please confirm if you applied the settings.
@info_AM, yes.
We still have the issue … the sound requests through convai to elevenlabs are just “slow motion” and ten times worse than on elevenlabs. Did you have a chance to investigate this?
well - I dont mean the requests are slow .. but the responding audio files (as described above) sound like slow motion
Hi @info_AM the voice settings was set to an older settings request. We have now update it to the latest settings you requested. Kindly check.
@info_AM did you check?