Japanese API
Japanese G2P uses pyopenjtalk for text analysis and mora-based phoneme generation.
Main Class
- class kokorog2p.ja.JapaneseG2P(language: str = 'ja', use_espeak_fallback: bool = True, use_spacy: bool = False, spacy_model: str = 'ja_core_news_sm', backend: str = 'pyopenjtalk', unk: str = '', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs)[source]
Bases:
G2PBaseJapanese G2P using pyopenjtalk or cutlet.
- Example:
>>> g2p = JapaneseG2P() >>> tokens = g2p("こんにちは")
- __init__(language: str = 'ja', use_espeak_fallback: bool = True, use_spacy: bool = False, spacy_model: str = 'ja_core_news_sm', backend: str = 'pyopenjtalk', unk: str = '', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs) None[source]
Initialize the Japanese G2P.
- Args:
language: Language code (e.g., ‘ja’, ‘ja-jp’). use_espeak_fallback: Whether to use espeak for unknown words. use_spacy: Reserved for API consistency. Japanese uses
pyopenjtalk/cutlet backends for tokenization and phonemization.
spacy_model: Reserved for API consistency when use_spacy is enabled. backend: Backend to use (“pyopenjtalk” or “cutlet”). unk: Unknown token placeholder. load_silver: If True, load silver tier dictionary if available.
Currently Japanese doesn’t use dictionary system, so this parameter is reserved for future use and consistency. Defaults to True for consistency.
- load_gold: If True, load gold tier dictionary if available.
Currently Japanese doesn’t use dictionary system, so this parameter is reserved for future use and consistency. Defaults to True for consistency.
- version: Model version (“1.0” for base, “1.1” for multilingual).
Default: “1.0”.
**kwargs: Additional arguments.
- property pyopenjtalk
Lazy import of pyopenjtalk.
- property cutlet
Lazy initialization of Cutlet backend.
- __call__(text: str) list[GToken][source]
Convert text to tokens with phonemes.
- Args:
text: Input text to convert.
- Returns:
List of GToken objects with phonemes.
- lookup(word: str, tag: str | None = None) str | None[source]
Look up a word’s phonemes.
- Args:
word: The word to look up. tag: Optional POS tag (ignored for Japanese).
- Returns:
Phoneme string or None.
Examples
from kokorog2p.ja import JapaneseG2P
g2p = JapaneseG2P(language="ja")
tokens = g2p("こんにちは世界")
for token in tokens:
print(f"{token.text} -> {token.phonemes}")
Features
pyopenjtalk for full Japanese text analysis
Mora-based phoneme generation
Automatic pitch accent assignment
Japanese numeral handling