Installation

kokorog2p can be installed with different feature sets depending on your needs.

Basic Installation

The core package has minimal dependencies:

pip install kokorog2p

This gives you:

  • Core G2P functionality

  • Basic phoneme conversion

  • German, Czech support (rule-based)

  • Number handling

With Language Support

English (with spaCy)

For full English support with POS tagging and advanced tokenization:

pip install kokorog2p[en]

This includes:

  • spaCy with English model

  • US and GB dictionaries (gold/silver tiers)

  • Context-dependent pronunciation

  • Number and currency expansion

By default, English G2P uses en_core_web_md for POS tagging (downloaded on first use if missing). You can override this with spacy_model=....

French

For French support:

pip install kokorog2p[fr]

This includes:

  • French gold dictionary

  • espeak-ng fallback

  • Number and currency handling

Chinese

For Chinese support:

pip install kokorog2p[zh]

This includes:

  • jieba for tokenization

  • pypinyin for pinyin conversion

  • cn2an for number handling

  • Tone sandhi rules

Japanese

For Japanese support:

pip install kokorog2p[ja]

This includes:

  • pyopenjtalk for text analysis

  • Cutlet for romanization

  • Mora-based phoneme generation

Mixed-Language Detection

For automatic language detection in mixed-language texts:

pip install kokorog2p[mixed]

This includes:

  • lingua-language-detector for high-accuracy detection

  • Automatic routing to appropriate G2P engines

  • Support for 17+ languages

  • Caching for performance

With Backend Support

espeak-ng Backend

For espeak-ng fallback (recommended for production):

pip install kokorog2p[espeak]

This includes:

  • espeak-ng Python bindings

  • Fallback for OOV words

  • Support for 100+ languages via espeak-ng

goruut Backend

For goruut backend (experimental):

pip install kokorog2p[goruut]

Full Installation

To install all features:

pip install kokorog2p[all]

This includes all language packs and backends.

Development Installation

For development, clone the repository and install in editable mode:

git clone https://github.com/hexgrad/kokorog2p.git
cd kokorog2p
pip install -e ".[dev]"

This includes:

  • All language packs and backends

  • Development tools (pytest, ruff, mypy)

  • Pre-commit hooks

  • Documentation building tools

System Dependencies

espeak-ng

If using the espeak backend, you’ll need espeak-ng installed on your system:

Ubuntu/Debian:

sudo apt-get install espeak-ng

macOS:

brew install espeak-ng

Windows:

Download the installer from the espeak-ng releases page.

Verifying Installation

To verify your installation:

import kokorog2p
print(kokorog2p.__version__)

# Test basic functionality
from kokorog2p import phonemize
result = phonemize("Hello world!", language="en-us")
print(result)

If you see phoneme output, your installation is successful!

Troubleshooting

Import Errors

If you get import errors for optional dependencies:

# Check what's installed
import importlib.util

# Check for spaCy
spacy_available = importlib.util.find_spec("spacy") is not None
print(f"spaCy available: {spacy_available}")

# Check for espeak
espeak_available = importlib.util.find_spec("espeakng_loader") is not None
print(f"espeak-ng available: {espeak_available}")

Missing Language Models

If spaCy models are missing:

# Default English model used by kokorog2p
python -m spacy download en_core_web_md

# Optional alternatives
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg

Performance Issues

For better performance:

  1. Use dictionary-based G2P when possible (English, German, French)

  2. Enable caching (enabled by default)

  3. Reuse G2P instances instead of creating new ones

  4. Consider using espeak-ng fallback only for truly OOV words