Japanese API

Japanese G2P uses pyopenjtalk for text analysis and mora-based phoneme generation.

Main Class

class kokorog2p.ja.JapaneseG2P(language: str = 'ja', use_espeak_fallback: bool = True, use_spacy: bool = False, spacy_model: str = 'ja_core_news_sm', backend: str = 'pyopenjtalk', unk: str = '', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs)[source]

Bases: G2PBase

Japanese G2P using pyopenjtalk or cutlet.

Example:
>>> g2p = JapaneseG2P()
>>> tokens = g2p("こんにちは")
__init__(language: str = 'ja', use_espeak_fallback: bool = True, use_spacy: bool = False, spacy_model: str = 'ja_core_news_sm', backend: str = 'pyopenjtalk', unk: str = '', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs) None[source]

Initialize the Japanese G2P.

Args:

language: Language code (e.g., ‘ja’, ‘ja-jp’). use_espeak_fallback: Whether to use espeak for unknown words. use_spacy: Reserved for API consistency. Japanese uses

pyopenjtalk/cutlet backends for tokenization and phonemization.

spacy_model: Reserved for API consistency when use_spacy is enabled. backend: Backend to use (“pyopenjtalk” or “cutlet”). unk: Unknown token placeholder. load_silver: If True, load silver tier dictionary if available.

Currently Japanese doesn’t use dictionary system, so this parameter is reserved for future use and consistency. Defaults to True for consistency.

load_gold: If True, load gold tier dictionary if available.

Currently Japanese doesn’t use dictionary system, so this parameter is reserved for future use and consistency. Defaults to True for consistency.

version: Model version (“1.0” for base, “1.1” for multilingual).

Default: “1.0”.

**kwargs: Additional arguments.

property pyopenjtalk

Lazy import of pyopenjtalk.

property cutlet

Lazy initialization of Cutlet backend.

static pron2moras(pron: str) list[str][source]

Convert pronunciation to mora list.

__call__(text: str) list[GToken][source]

Convert text to tokens with phonemes.

Args:

text: Input text to convert.

Returns:

List of GToken objects with phonemes.

lookup(word: str, tag: str | None = None) str | None[source]

Look up a word’s phonemes.

Args:

word: The word to look up. tag: Optional POS tag (ignored for Japanese).

Returns:

Phoneme string or None.

phonemize(text: str) str[source]

Convert text to phonemes.

Args:

text: Input text to convert.

Returns:

Phoneme string.

get_target_model() str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:

Model identifier: version string (“1.1” or “1.0”).

Examples

from kokorog2p.ja import JapaneseG2P

g2p = JapaneseG2P(language="ja")
tokens = g2p("こんにちは世界")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")

Features

  • pyopenjtalk for full Japanese text analysis

  • Mora-based phoneme generation

  • Automatic pitch accent assignment

  • Japanese numeral handling