Audio (mp3) input for NPC

info_AM · February 18, 2025, 11:31am

Similar to this question Text box to have Convai say whatever i write - #2 by K3 I want to be able to play a mp3 file and show the text.

I created this mp3 at 11labs with pauses in between paragraphs. I want to play this and also show the transcript in the ui. How is this possible?
Is it possible to use EnqueueResponse in the ConvaiNPC.cs script? Or AddResponseAudio in ConvaiNPCAudioManager?

info_AM · February 18, 2025, 12:48pm

Ok, I have it working like this. Please review and tell my why I need to use
SetWaitForCharacterLipSync(false)
and if that could cause any conflict? Right now it seems to work.

convaiNPC.AudioManager.SetWaitForCharacterLipSync(false);
convaiNPC.AudioManager.AddResponseAudio(new ConvaiNPCAudioManager.ResponseAudio
{
    AudioClip = _introductionRoom.introductionClip,
    AudioTranscript = _introductionRoom.introductionTranscript,
    IsFinal = false
});

introductionClip is my .mp3 file and introductionTranscript is the text as string.

K3 · February 18, 2025, 1:19pm

In the PlayAudioInOrder Coroutine, the while (_waitForCharacterLipSync) statement is used to wait for the lip sync data. SetWaitForCharacterLipSync(false) controls this loop, allowing the process to continue.

info_AM · February 18, 2025, 1:28pm

Yes, I saw that in line 99 in ConvaiNPCAudioManager.cs
With some debug outputs I saw that it’s not false and I had to add that

SetWaitForCharacterLipSync(false);

in my script.

Is it safe that I set it to false? Or should I call anything else beforehand or a function that in the end sets it to false?

K3 · February 18, 2025, 1:31pm

Please try without it, it should work.

info_AM · February 18, 2025, 1:39pm

It didn’t if there is nothing happening beforehand. Meaning no other user input. I had to add that SetWaitForCharacterLipSync(false) to get it working.

K3 · February 18, 2025, 2:21pm

Yes, since it’s outside the normal flow, it may be necessary. If it’s working as expected, then there’s no issue.

info_AM · March 5, 2025, 5:38pm

Hmm, I’m a bit lost atm with getting lipsync working with my custom text.

Using this shows the text and plays the audio file:

convaiNPC.AudioManager.SetWaitForCharacterLipSync(false);
convaiNPC.AudioManager.AddResponseAudio(new ConvaiNPCAudioManager.ResponseAudio
{
    AudioClip = _introductionRoom.introductionClip,
    AudioTranscript = _introductionRoom.introductionTranscript,
    IsFinal = false
});

But lipSync is not working. I tried to dig through the code, but I’m not sure how to set it up. Also not sure if this can be done and cached somehow.

From what I see ProcessAudioResponse in line 580 of ConvaiGRPCAPI.cs contains the lipSyncBlendFrameQueue which “is the lipsync data”? Can I create this data in my App by giving a audio and/or text (string)?

Thanks

info_AM · March 10, 2025, 8:59am

Hi @K3, any update on my question?
Thanks

K3 · March 11, 2025, 12:12pm

Thank you for following up. As this is currently not a supported use case, we’re unable to prioritize it as a high urgency request at this time.

We appreciate your understanding and will keep you updated if there are any changes.

info_AM · March 11, 2025, 12:48pm

Thanks for the reply.

I’ll dig deeper and see if I can use the “create lip sync” functionalities locally. I guess that the lip sync is created locally, right?
Can you point me in the right direction: Can I somewhere input a audio (mp3) or a text and get lipsync “data”?
I mean you get a response from the server with audio (mp3?) and text, right? And then this is used to create the lipsync data. Once that is finished all three “contents” are used / played(?).

Thanks

K3 · March 11, 2025, 12:52pm

No, not locally.

info_AM · March 11, 2025, 3:26pm

Hmm, ok. I saw that in ReceiveResultFromServer function in ConvaiGRPCAPI.cs the data is processed. ProcessAudioResponse function is called and handles the lipsync part. Indead it seems not possible to create it locally(?!).

Would it be possible to store / cache a response to use directly in the app. So that I could play a cached response directly without the delay of “InvokeTrigger” and waiting for the response.

Thanks again

K3 · April 14, 2025, 12:03pm

This may be possible with a little customization, but it is entirely up to you to build it.

system · May 14, 2025, 12:03pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Lipsync - Text to Speech Questions unreal-engine	11	52	November 25, 2024
Tts Questions unity	14	35	November 25, 2024
How to add custom audio work with convai lipsync Questions unity	4	40	June 3, 2025
Unity - Text In Text Out Only? Questions unity	29	59	November 25, 2024
TTS to Lipsync Questions unreal-engine	9	21	November 25, 2024

Audio (mp3) input for NPC

Related topics