Does Convai support native audio processing with Gemini 2.5 Flash Live

Melvin_Eng · February 26, 2026, 10:19am

Hi there,

I’d selected Gemini 2.5 Flash Live as my avatar’s foundational model(which supposedly processes audio natively instead of having to convert between text and speech every time, with the additional ability to analyze the emotional cadence of the user’s speech).

However, the speech generation is still laggy, and it clearly lacks the ability to recognize the emotional tone of my speech. Am I missing something?

Any tips and advice would be greatly appreciated!

Regards,

Melvin Eng.

K3 · February 26, 2026, 10:25am

Hello,

Where do you test it?

Melvin_Eng · February 26, 2026, 3:48pm

Hi there,

I’m using a sample scene that came with the Convai package, running in Unity 6 on Windows 11.

I’d created a character in Playground and selected Gemini 2.5 Flash Live as the foundational model, somewhat expecting the speech latency to improve as well as some ability to recognize the user’s emotional tone. However the latency was just as bad and no ability to distinguish the user’s emotional tone was evident.

FYI, I’m currently testing out Convai in Unity using the free plan.

My character ID: 58698aac-125c-11f1-9b66-42010a7be02c

Would appreciate if you could shed some light on this?

Sincere thanks!

Melvin Eng.

K3 · February 26, 2026, 3:49pm

Which Convai Package do you use?

Melvin_Eng · February 27, 2026, 1:56am

Hi,

As mentioned above, I’m currently using the Free Plan to test out Convai’s capabilities first.

Regards,

Melvin.

K3 · February 27, 2026, 3:01am

What version of the Convai plugin are you using?

Melvin_Eng · February 27, 2026, 7:23am

Hi,

Oh, I’m using Convai for Unity v3.3.0.

Do let me know if there is any other info you need.

Cheers,

Melvin.

K3 · February 27, 2026, 7:28am

You are using the old plugin. Please try the new beta package instead. It includes an MR sample scene you can use as a reference.

Melvin_Eng · February 27, 2026, 7:34am

Hi again,

Thanks for the speedy response.

Right, I’ll update my installed package asap.

Btw does the updated package take advantage of native audio processing as afforded by models like Gemini 2.5 Flask Live then?

Regards,

Melvin.

K3 · February 27, 2026, 7:44am

Make sure to create a new project and yes.

Melvin_Eng · February 28, 2026, 4:28am

Hi again,

I’ve updated the Convai Unity plug-in to v3.3.3, and selected Gemini 2.5 Flash Live as my avatar’s foundation model and tested it, but my avatar still lacks the ability to discern sounds made by the user(which indicates that the processing is still text-based as opposed to native audio processing?)

I’d tested with my Gemini chatbot and it does demonstrate said ability.

Specifically, I’m wondering if Convai v3.3.3 actually supports Gemini 2.5 Flash Live via Gemini Live API? (see https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-live-api )

Or am I missing something here?

Cheers,

Melvin.

K3 · February 28, 2026, 7:51am

Melvin_Eng · February 28, 2026, 3:51pm

Hi again Kaan,

Apologies, I somehow missed out the bit about the beta package.

Yes I finally got it to work. And the visual perception works along with audio perception as well. Latency is also better now.

Really looking forward to the upcoming lip-sync feature, as I’m planning to implement an embodied AI avatar that speaks with lip-synced animation as well as the ability to look around and understand the virtual scene.

Cheers!

Melvin Eng.

Melvin_Eng · March 2, 2026, 11:11pm

Hi Kaan,

So any timeline for lip-sync support?

Cheers,

Melvin Eng.

K3 · March 3, 2026, 2:24am

I can’t share an exact timeline, but lip sync support is coming very soon. It’s currently in testing.

Melvin_Eng · March 3, 2026, 3:23am

Hi again,

Great! Very glad to hear that it’s coming soon

Cheers,

Melvin Eng.

Topic		Replies	Views
Native audio speech processing? Visual understanding of virtual scene? Language and Speech unity , language-and-speech	7	110	December 29, 2025
Does the new Convai update support Android and lower latency Questions unity	8	54	April 7, 2026
Can I use Convai on my own character that is not a metahuman Questions unreal-engine	4	70	February 21, 2026
Regarding delay when we ask something to convai ai character Questions unity	3	31	April 7, 2026
ElevenLabs Voice audio response via convai much more buggy than via ElevenLabs Language and Speech language-and-speech	14	230	November 9, 2025

Does Convai support native audio processing with Gemini 2.5 Flash Live

Related topics