Problem when prepending audio prefix before VAD mic stream

Fahimeh_TL · September 18, 2025, 8:54am

I’m integrating Convai with Unity and using a VAD-based audio pipeline that works fine on its own.

The issue appears when I add an audio prefix:

At the start of the utterance, I push a short PCM clip that says “translate”.
After that, VAD continues and streams mic audio as usual.

Without the prefix → everything works (full sentences are transcribed).
With the prefix + VAD → the server seems finalizes after the prefix, and in recent memory I only see "Translate.".

Since it is a black box and I can’t see exactly what it receives internally, I’d like to understand why this cutoff happens and what the recommended way is to prepend a prefix without losing the rest of the utterance.
I checked my code and the frequency is 16000 for both mic and prefix.

________________
My goal is to have a character that translates between two languages.
If I rely only on the objective of the section, the model sometimes misunderstands — for example, if the input is a question, it may try to answer it itself instead of translating.

When I prepend the word “translate” at the beginning of the prompt, it works reliably.
With text input, this works fine.
But to reduce latency, I need to do this through voice — and since I can’t send text + voice in one stream, I tried using an audio prefix that says “translate” before the mic audio.

K3 · September 22, 2025, 10:31am

Hello @Fahimeh_TL,

Welcome to the Convai Developer Forum!

Unfortunately, external setups like the one you described fall outside our support scope.

Fahimeh_TL · September 23, 2025, 1:16pm

Thanks.
Could you clarify the server-side rules for detecting the end of an utterance during audio streaming?
Specifically:
1)Does it expect a final end of stream signal from the client( CompleteAsync() ), or can it terminate automatically because of other reasons?
2)Are there size/time limits for a single utterance (e.g., max seconds of audio, max bytes)?

Topic		Replies	Views
No audio data received on convai website Language and Speech language-and-speech	13	108	May 20, 2025
Pixel stream audio not working with convai Questions unreal-engine	4	123	January 23, 2025
Convai Serves Issue or my Issue? Language and Speech unity , language-and-speech	4	59	July 5, 2025
Unreal – Audio Cuts Off while subtitles are still going Language and Speech language-and-speech , conversation-issues , unreal-engine	2	51	July 28, 2025
STT issues with input and streaming transcription Language and Speech language-and-speech	2	10	April 12, 2026

Problem when prepending audio prefix before VAD mic stream

Related topics