Welcome to kokorog2p’s documentation!

kokorog2p is a unified G2P (Grapheme-to-Phoneme) library for Kokoro TTS, providing high-quality text-to-phoneme conversion for multiple languages.

Features

Multi-language support: English (US/GB), German, French, Czech, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Hebrew
Mixed-language detection: Automatic detection and handling of texts mixing multiple languages
Dictionary-based lookup with large gold/silver tier lexicons for select languages
Rule-based G2P for Romance and Slavic languages with comprehensive phonological rules
espeak-ng integration as a fallback for out-of-vocabulary words
Automatic IPA to Kokoro phoneme conversion
Number and currency handling across all languages
Stress assignment based on linguistic rules
High performance with caching and optimized lookup

Quick Start

from kokorog2p import phonemize

# English
phonemes = phonemize("Hello world!", language="en-us")
print(phonemes)  # hˈɛlO wˈɜɹld!

# German
phonemes = phonemize("Guten Tag", language="de")
print(phonemes)  # ɡuːtn̩ taːk

# French
phonemes = phonemize("Bonjour", language="fr")
print(phonemes)  # bɔ̃ʒuʁ

Installation

# Core package
pip install kokorog2p

# With English support (includes spaCy)
pip install kokorog2p[en]

# With espeak-ng backend
pip install kokorog2p[espeak]

# Full installation (all languages and backends)
pip install kokorog2p[all]

User Guide

API Reference

Development

Welcome to kokorog2p’s documentation!

Features

Quick Start

Installation

Indices and tables