Audio (mp3) input for NPC

Similar to this question Text box to have Convai say whatever i write - #2 by K3 I want to be able to play a mp3 file and show the text.

I created this mp3 at 11labs with pauses in between paragraphs. I want to play this and also show the transcript in the ui. How is this possible?
Is it possible to use EnqueueResponse in the ConvaiNPC.cs script? Or AddResponseAudio in ConvaiNPCAudioManager?

Ok, I have it working like this. Please review and tell my why I need to use
SetWaitForCharacterLipSync(false)
and if that could cause any conflict? Right now it seems to work.

convaiNPC.AudioManager.SetWaitForCharacterLipSync(false);
convaiNPC.AudioManager.AddResponseAudio(new ConvaiNPCAudioManager.ResponseAudio
{
    AudioClip = _introductionRoom.introductionClip,
    AudioTranscript = _introductionRoom.introductionTranscript,
    IsFinal = false
});

introductionClip is my .mp3 file and introductionTranscript is the text as string.

In the PlayAudioInOrder Coroutine, the while (_waitForCharacterLipSync) statement is used to wait for the lip sync data. SetWaitForCharacterLipSync(false) controls this loop, allowing the process to continue.

Yes, I saw that in line 99 in ConvaiNPCAudioManager.cs
With some debug outputs I saw that it’s not false and I had to add that

SetWaitForCharacterLipSync(false);

in my script.

Is it safe that I set it to false? Or should I call anything else beforehand or a function that in the end sets it to false?

Please try without it, it should work.

It didn’t if there is nothing happening beforehand. Meaning no other user input. I had to add that SetWaitForCharacterLipSync(false) to get it working.

Yes, since it’s outside the normal flow, it may be necessary. If it’s working as expected, then there’s no issue.

Hmm, I’m a bit lost atm with getting lipsync working with my custom text.

Using this shows the text and plays the audio file:

convaiNPC.AudioManager.SetWaitForCharacterLipSync(false);
convaiNPC.AudioManager.AddResponseAudio(new ConvaiNPCAudioManager.ResponseAudio
{
    AudioClip = _introductionRoom.introductionClip,
    AudioTranscript = _introductionRoom.introductionTranscript,
    IsFinal = false
});

But lipSync is not working. I tried to dig through the code, but I’m not sure how to set it up. Also not sure if this can be done and cached somehow.

From what I see ProcessAudioResponse in line 580 of ConvaiGRPCAPI.cs contains the lipSyncBlendFrameQueue which “is the lipsync data”? Can I create this data in my App by giving a audio and/or text (string)?

Thanks

Hi @K3, any update on my question?
Thanks

Thank you for following up. As this is currently not a supported use case, we’re unable to prioritize it as a high urgency request at this time.

We appreciate your understanding and will keep you updated if there are any changes.

Thanks for the reply.

I’ll dig deeper and see if I can use the “create lip sync” functionalities locally. I guess that the lip sync is created locally, right?
Can you point me in the right direction: Can I somewhere input a audio (mp3) or a text and get lipsync “data”?
I mean you get a response from the server with audio (mp3?) and text, right? And then this is used to create the lipsync data. Once that is finished all three “contents” are used / played(?).

Thanks

No, not locally.

Hmm, ok. I saw that in ReceiveResultFromServer function in ConvaiGRPCAPI.cs the data is processed. ProcessAudioResponse function is called and handles the lipsync part. Indead it seems not possible to create it locally(?!).

Would it be possible to store / cache a response to use directly in the app. So that I could play a cached response directly without the delay of “InvokeTrigger” and waiting for the response.

Thanks again