Czech API

Czech G2P provides rule-based phoneme conversion with comprehensive phonological rules.

Main Class

class kokorog2p.cs.CzechG2P(language: str = 'cs-cz', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any)[source]

Bases: G2PBase

Czech G2P converter using rule-based phoneme conversion with fallback options.

This class provides grapheme-to-phoneme conversion for Czech text using phonological rules for voicing assimilation, palatalization, and other Czech-specific features, with optional fallback to espeak or goruut.

Example:

>>> g2p = CzechG2P()
>>> tokens = g2p("Dobrý den")
>>> for token in tokens:
...     print(f"{token.text} -> {token.phonemes}")

__init__(language: str = 'cs-cz', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any) → None[source]

Initialize the Czech G2P converter.

Args:

language: Language code (default: ‘cs-cz’). use_espeak_fallback: Whether to use espeak for OOV words. use_goruut_fallback: Whether to use goruut for OOV words. unk: Character to use for unknown characters. load_silver: If True, load silver tier dictionary if available.

Currently Czech uses rule-based G2P, so this parameter is reserved for future use and consistency. Defaults to True for consistency.

load_gold: If True, load gold tier dictionary if available.: Currently Czech uses rule-based G2P, so this parameter is reserved for future use and consistency. Defaults to True for consistency.
expand_abbreviations: If True, expand common abbreviations: (e.g., “Dr.” → “Doktor”). Defaults to True.
enable_context_detection: If True, use context-aware expansion: for ambiguous abbreviations. Defaults to True.

Raises:

ValueError: If both use_espeak_fallback and use_goruut_fallback are True.

__call__(text: str) → list[GToken][source]

Convert text to a list of tokens with phonemes.

Args:: text: Input text to convert.
Returns:: List of GToken objects with phonemes assigned.

lookup(word: str, tag: str | None = None) → str | None[source]

Look up a word in the dictionary.

For Czech, this just converts the word to phonemes using rules.

Args:: word: The word to look up. tag: Optional POS tag (not used for Czech).
Returns:: Phoneme string.

get_target_model() → str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:: Model identifier: version string (“1.1” or “1.0”).

Examples

from kokorog2p.cs import CzechG2P

g2p = CzechG2P(language="cs-cz")
tokens = g2p("Dobrý den, jak se máte?")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")

Phonological Rules

Czech G2P implements the following phonological rules:

Palatalization: d+i → ɟ, t+i → c, n+i → ɲ
Long vowels: á → aː, í → iː, ú/ů → uː, é → eː, ó → oː
ř phoneme: Special raised alveolar trill [r̝]
CH digraph: ch → [x]
Final devoicing: Voiced consonants become voiceless at word end
Voicing assimilation: Consonant clusters assimilate in voicing