So, there are two main classes: one being the “media handler” and the other one being the “api handler”.
Info:
- One script handles the API: sends user text/audio to Convai, receives the AI’s response, plays the voice audio if included.
- The other one intercepts the AI’s response text, looks for patterns like “video of…” or other language variants, and uses fuzzy matching to find the closest local video listed in a
videoList.txt
file (containing the video file names, and I know, it’s not very dynamic). - If a match is found, it shows the video UI and plays the corresponding video from the local folder.
- Video titles are loaded asynchronously from
videoList.txt
inStreamingAssets/Videos
, making it easy to update the list without recompiling. - For fuzzy matching, I used FuzzySharp (NuGet package) to compare the requested video name with available titles and trigger playback when the match confidence exceeds a threshold (I set it at 30%).
- I handle platform differences by copying the video to
Application.persistentDataPath
before playing it, which is especially needed on Android where direct access toStreamingAssets
isn’t possible. - Commands work across multiple languages by checking for keywords like:
- “video of” (English)
- “video di” (Italian)
- “vidéo de” (French)
- etc.
- Example workflow:
- User says: “I’d like to see the video of -insert video title here-.”
- Convai responds: “Here is the video of -insert video title here-.”
- The media handler parses “video of -insert video title here-”, matches it to the closest video, and plays it automatically.
Here are the links to the 2 classes:
The ConvaiAPIHandler is attached to an empty game object, while the ConvaiMediaHandler is attached directly to the canvas of the video player.
If you have any questions please ask.