Hello Convai team and community,
I am currently developing a project in Unreal Engine using Convai with MetaHuman, and I would like to ask about an issue related to Japanese pronunciation.
Even when the Convai character’s speaking language is set to Japanese, kanji words are frequently mispronounced during speech output.
For example:
- 自己嫌悪 is pronounced as “jiko-ken-aku” instead of the correct “jiko-ken-o”.
To address this, I tried using Language and Speech → Add Custom Pronunciation, but the correction does not seem to be reflected in the actual spoken output.
My current setup and observations are as follows:
-
Language: Japanese
-
Engine: Unreal Engine
-
Character: MetaHuman + Convai
-
The issue occurs during spoken responses (TTS), not just text display
-
Custom pronunciation is saved in the UI, but the spoken output remains incorrect
-
The UI mentions that Custom Pronunciation currently works for English only — could this be the root cause?
I would appreciate clarification on the following points:
-
Is Custom Pronunciation officially unsupported for Japanese, even though the UI allows input?
-
For Japanese kanji mispronunciations, what are the recommended workarounds at this time?
-
Manually rewriting text in hiragana or katakana
-
Forcing phonetic readings via preprocessing, SSML, or similar methods
-
-
Are there any concrete solutions such as changing the AI model or voice/TTS engine that could improve Japanese pronunciation accuracy?
- For example, selecting a model that is stronger in Japanese, or using a different TTS pipeline
-
Are there any plans to support Japanese custom pronunciation in the future?
This issue directly impacts the naturalness of the conversation experience and makes interactions feel noticeably unnatural, so it is something we are very eager to resolve.
Any best practices, technical guidance, or realistic mitigation strategies would be greatly appreciated.
Thank you very much for your support.
Best regards,
Sacco
