Welcome to kokorog2p’s documentation!
kokorog2p is a unified G2P (Grapheme-to-Phoneme) library for Kokoro TTS, providing high-quality text-to-phoneme conversion for multiple languages.
Features
Multi-language support: English (US/GB), German, French, Czech, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Hebrew
Mixed-language detection: Automatic detection and handling of texts mixing multiple languages
Dictionary-based lookup with large gold/silver tier lexicons for select languages
Rule-based G2P for Romance and Slavic languages with comprehensive phonological rules
espeak-ng integration as a fallback for out-of-vocabulary words
Automatic IPA to Kokoro phoneme conversion
Number and currency handling across all languages
Stress assignment based on linguistic rules
High performance with caching and optimized lookup
Quick Start
from kokorog2p import phonemize
# English
phonemes = phonemize("Hello world!", language="en-us")
print(phonemes) # hˈɛlO wˈɜɹld!
# German
phonemes = phonemize("Guten Tag", language="de")
print(phonemes) # ɡuːtn̩ taːk
# French
phonemes = phonemize("Bonjour", language="fr")
print(phonemes) # bɔ̃ʒuʁ
Installation
# Core package
pip install kokorog2p
# With English support (includes spaCy)
pip install kokorog2p[en]
# With espeak-ng backend
pip install kokorog2p[espeak]
# Full installation (all languages and backends)
pip install kokorog2p[all]
User Guide
API Reference
Development