I’m currently using the Convai plugin in Unreal Engine, but I want to set up my own streaming TTS model instead of using Convai’s built-in voices. However, I still want to use ConvaiFaceSync for lip sync.
Is there a way to connect my custom TTS model’s phoneme/timestamp data to ConvaiFaceSync? How does ConvaiFaceSync process lip sync—does it rely on specific phoneme input, time markers, or audio waveforms?
Any guidance on how to make this work would be greatly appreciated.