Using ConvaiFaceSync for Lip Sync with Custom Streaming TTS Model

I’m currently using the Convai plugin in Unreal Engine, but I want to set up my own streaming TTS model instead of using Convai’s built-in voices. However, I still want to use ConvaiFaceSync for lip sync.

Is there a way to connect my custom TTS model’s phoneme/timestamp data to ConvaiFaceSync? How does ConvaiFaceSync process lip sync—does it rely on specific phoneme input, time markers, or audio waveforms?

Any guidance on how to make this work would be greatly appreciated.

Hello @Elon,

Unfortunately, this is not a supported use case within Convai, so we are unable to provide assistance for this setup.

We appreciate your understanding. Let us know if you have any other questions!