Need guidance in integrating Custom Unreal Avatars with Convai via Pixel Streaming

Hi Convai team,

We’re currently integrating a custom avatar pipeline using Unreal Engine, and we’re planning to utilize Pixel Streaming to display our AI avatar on a frontend web application.

Here’s a quick summary of our setup and the questions we have:

:wrench: Our Setup

  • Avatar is built in Unreal Engine and deployed via Convai.
  • We require audio-visual synchronization between the avatar’s animation and AI-generated speech.
  • The frontend application needs to stream the avatar (real-time), preferably via Unreal Pixel Streaming.

:red_question_mark:Key Questions

  1. What is the recommended method to stream a Convai-deployed avatar (with synced audio and animation) to a React or Vite frontend?

    • Should we host the Unreal client locally and use Pixel Streaming directly?
    • Or should we rely entirely on Convai’s streaming endpoint if the avatar is hosted on your infrastructure?
  2. Can we upload our own custom avatars (e.g., .glb files or MetaHuman models) directly into Convai’s system for use with their AI models and animation pipeline?

    • If so, what’s the best practice for deployment—via Convai or Unreal Engine?
  3. What are the necessary components or services required to deploy Pixel Streaming in production from an Unreal project that uses Convai?

    • Signal server setup, media servers, firewall rules, etc.
    • Is there a reference deployment architecture you recommend?
  4. For real-time syncing of voice and facial animation, does Convai handle both audio synthesis and animation control, or do we need to separately trigger animation sequences inside Unreal based on TTS input?

  5. What’s the best way to integrate user input (e.g., text or microphone input from the browser) into the streaming avatar’s conversational loop?

    • Any sample integrations or frontend SDKs available?
  6. Are there any bandwidth, latency, or GPU requirements we should plan for when using Convai + Unreal Pixel Streaming at scale?

  7. Is there a roadmap for supporting other streaming technologies (e.g., WebRTC directly without Unreal client dependency) for rendering avatars?

  8. We need the User’s input (whatever user talking with avatar ) to send to the backend server to process further. Is this possible if our server is hosted on Convai’s server or Unreal Client hosted on cloud? If yes, please guide us in right direction.

Looking forward to your guidance or any relevant documentation or code samples.

Thanks in advance!
— Shubham

Hello @Shubham_Vaidya,

Welcome to the Convai Developer Forum!

Thanks for reaching out and for providing such a detailed overview of your setup and questions!

Here are some clarifications:

  1. You are free to use your own Pixel Streaming setup. If you’d prefer to use Convai’s infrastructure, please schedule a call with our sales team: https://calendly.com/convai-intro/hi

  2. Yes, you will soon be able to upload custom avatars directly into Avatar Studio. This feature is currently in development and will be released shortly.

  3. If you host Pixel Streaming yourself, Unreal Engine’s general documentation and architecture will apply. Convai handles the backend if you use our streaming solution.

  4. Audio and facial animation synchronization is handled automatically by Convai’s SDK. You do not need to trigger animations manually.

  5. If you’re using our Unreal Engine SDK, we’ve already taken care of this part.

  6. Performance requirements will vary depending on your specific deployment and scale.

  7. I don’t understand this question.

  8. Yes, it is possible.


To summarize, for a streamlined setup we highly recommend exploring Avatar Studio and Embed Experience. These allow you to get started within minutes and offer extensive customization options for your character and experience.

If you’re interested in learning more or tailoring a solution to your specific project needs, feel free to schedule a call with our sales team here:
https://calendly.com/convai-intro/hi

You can also watch our overview video on the Embed Experience:
https://youtu.be/8myRcmAe-6o

If you prefer to manage your own Pixel Streaming infrastructure, that is entirely up to you. However, please note that we do not provide technical support for third-party Pixel Streaming deployments. For guidance, we recommend referring to external Unreal Engine resources.

Convai-specific integration details for Pixel Streaming are available here:
https://docs.convai.com/api-docs/plugins-and-integrations/unreal-engine/guides/integration-with-pixel-streaming

Lastly, features like emotion handling, lipsync, and user inputs are fully supported by the Convai Unreal SDK by default, no extra configuration is required.

Let us know if you need any further assistance!