Japanese kanji are frequently mispronounced by Convai characters even after adding custom pronunciation

Hello Convai team and community,

I am currently developing a project in Unreal Engine using Convai with MetaHuman, and I would like to ask about an issue related to Japanese pronunciation.

Even when the Convai character’s speaking language is set to Japanese, kanji words are frequently mispronounced during speech output.
For example:

  • 自己嫌悪 is pronounced as “jiko-ken-aku” instead of the correct “jiko-ken-o”.

To address this, I tried using Language and Speech → Add Custom Pronunciation, but the correction does not seem to be reflected in the actual spoken output.

My current setup and observations are as follows:

  • Language: Japanese

  • Engine: Unreal Engine

  • Character: MetaHuman + Convai

  • The issue occurs during spoken responses (TTS), not just text display

  • Custom pronunciation is saved in the UI, but the spoken output remains incorrect

  • The UI mentions that Custom Pronunciation currently works for English only — could this be the root cause?

I would appreciate clarification on the following points:

  1. Is Custom Pronunciation officially unsupported for Japanese, even though the UI allows input?

  2. For Japanese kanji mispronunciations, what are the recommended workarounds at this time?

    • Manually rewriting text in hiragana or katakana

    • Forcing phonetic readings via preprocessing, SSML, or similar methods

  3. Are there any concrete solutions such as changing the AI model or voice/TTS engine that could improve Japanese pronunciation accuracy?

    • For example, selecting a model that is stronger in Japanese, or using a different TTS pipeline
  4. Are there any plans to support Japanese custom pronunciation in the future?

This issue directly impacts the naturalness of the conversation experience and makes interactions feel noticeably unnatural, so it is something we are very eager to resolve.
Any best practices, technical guidance, or realistic mitigation strategies would be greatly appreciated.

Thank you very much for your support.

Best regards,
Sacco

Hello,

Welcome to the Convai Developer Forum!

At the moment, the Custom Pronunciation feature only supports English:
https://docs.convai.com/api-docs/convai-playground/character-customization/language-and-speech#id-3.-add-custom-pronunciation

The best approach is to change the character’s voice to one of the GCP Japanese voices and pick those whose names/descriptions specify style and language, for example:

Despina (Smooth, Gentle Japanese Female Voice)

I’ve tried various options, including GCP Japanese Voices and Azure Voices, but they all sound extremely flat and monotone.
To be honest, they feel quite robotic—almost like text-to-speech technology from around 10 years ago. The lack of natural Japanese intonation, rhythm, and emotional nuance is really disappointing.

If improving this is difficult through voice selection alone, I would really appreciate any guidance on how to improve the results through:

  • prompt design

  • Convai configuration (emotion, speaking style, pacing, pauses, etc.)

Are there any recommended settings, best practices, or workarounds to achieve more natural, expressive Japanese speech?
Also, if there are plans to support more advanced or modern Japanese voice engines in the future, I’d love to hear about that as well.

These are all the newest voices from voice providers. You can use Elevenlabs voices for high quality voices.