Synchronizing keyword events with character speech

Has anyone managed to work with keywords in Convai? In other words, I wanted a way to check what the character is saying in real time and see if he said any keywords.

Hello @dev.euvatar,

Yes, you can definitely achieve keyword detection using the OnTextReceived event from the ConvaiChatBot component in Unreal Engine. This event gives you access to the character’s response text in real time. Once you have the response, you can use a simple string comparison, such as a Contains node in Blueprints or a string method in code.

Keep in mind that the Contains check is case-sensitive, so you may want to convert both the response text and the keyword to lower case to ensure consistent matching. Once a match is found, you can trigger any action you’d like based on that keyword.

1 Like

But when we use OnTextReceived, I receive all the text at once and I need to consider the keyword exactly at the moment a character speaks. Using the function in the traditional way and just checking if it contains, it would not be asynchronous with the character’s speech.

Unfortunately, I don’t quite understand what you want to do.

This attached image shows an example of the text I receive through OnTextReceived. In this example, the keyword is “our club” which is at the end of the text. So if I already check to see if it contains this keyword, it will work. But, not in sync with the character’s speech. In other words, it will be valid, but the character hasn’t even said the keyword yet because it is at the end of the text.

Yes, unfortunately that’s not possible. You can develop something custom.
Why are you trying to do that?

Because I need some events to be activated from certain keywords. For example “Swimming pool”, so every time Convai’s character says “Swimming pool”, an event will be executed.

As I said, there is no event or specific function for this. You have to develop this.

Ok, but I have a question about Convai’s logic. When the response message is received, it is transformed into audio through some Text to Speech, right? If so, where and how can I access this part of the audio generation?

In my view, I can take this audio that was generated and use it in another Speech to Text.

??? ??? Any feedback?

We currently don’t support direct customization or extraction of the generated audio stream for use in other pipelines like reprocessing with a separate STT system. These kinds of advanced integrations fall outside the scope of standard support.

If your project requires this level of control, we’d recommend reaching out to our sales team at sales@convai.com to discuss options under the Enterprise plan, which offers more flexibility for custom implementations.