Japanese API

Japanese G2P uses pyopenjtalk for text analysis and mora-based phoneme generation.

Main Class

class kokorog2p.ja.JapaneseG2P(language: str = 'ja', use_espeak_fallback: bool = True, use_spacy: bool = False, spacy_model: str = 'ja_core_news_sm', backend: str = 'pyopenjtalk', unk: str = '', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs)[source]

Bases: G2PBase

Japanese G2P using pyopenjtalk or cutlet.

Example:

>>> g2p = JapaneseG2P()
>>> tokens = g2p("こんにちは")

__init__(language: str = 'ja', use_espeak_fallback: bool = True, use_spacy: bool = False, spacy_model: str = 'ja_core_news_sm', backend: str = 'pyopenjtalk', unk: str = '', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs) → None[source]

Initialize the Japanese G2P.

Args:

language: Language code (e.g., ‘ja’, ‘ja-jp’). use_espeak_fallback: Whether to use espeak for unknown words. use_spacy: Reserved for API consistency. Japanese uses

pyopenjtalk/cutlet backends for tokenization and phonemization.

spacy_model: Reserved for API consistency when use_spacy is enabled. backend: Backend to use (“pyopenjtalk” or “cutlet”). unk: Unknown token placeholder. load_silver: If True, load silver tier dictionary if available.

Currently Japanese doesn’t use dictionary system, so this parameter is reserved for future use and consistency. Defaults to True for consistency.

load_gold: If True, load gold tier dictionary if available.: Currently Japanese doesn’t use dictionary system, so this parameter is reserved for future use and consistency. Defaults to True for consistency.
version: Model version (“1.0” for base, “1.1” for multilingual).: Default: “1.0”.

**kwargs: Additional arguments.

property pyopenjtalk: Lazy import of pyopenjtalk.

property cutlet: Lazy initialization of Cutlet backend.

static pron2moras(pron: str) → list[str][source]: Convert pronunciation to mora list.

__call__(text: str) → list[GToken][source]

Convert text to tokens with phonemes.

Args:: text: Input text to convert.
Returns:: List of GToken objects with phonemes.

lookup(word: str, tag: str | None = None) → str | None[source]

Look up a word’s phonemes.

Args:: word: The word to look up. tag: Optional POS tag (ignored for Japanese).
Returns:: Phoneme string or None.

phonemize(text: str) → str[source]

Convert text to phonemes.

Args:: text: Input text to convert.
Returns:: Phoneme string.

get_target_model() → str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:: Model identifier: version string (“1.1” or “1.0”).

Examples

from kokorog2p.ja import JapaneseG2P

g2p = JapaneseG2P(language="ja")
tokens = g2p("こんにちは世界")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")

Features

pyopenjtalk for full Japanese text analysis
Mora-based phoneme generation
Automatic pitch accent assignment
Japanese numeral handling