Czech API

Czech G2P provides rule-based phoneme conversion with comprehensive phonological rules.

Main Class

class kokorog2p.cs.CzechG2P(language: str = 'cs-cz', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any)[source]

Bases: G2PBase

Czech G2P converter using rule-based phoneme conversion with fallback options.

This class provides grapheme-to-phoneme conversion for Czech text using phonological rules for voicing assimilation, palatalization, and other Czech-specific features, with optional fallback to espeak or goruut.

Example:
>>> g2p = CzechG2P()
>>> tokens = g2p("Dobrý den")
>>> for token in tokens:
...     print(f"{token.text} -> {token.phonemes}")
__init__(language: str = 'cs-cz', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any) None[source]

Initialize the Czech G2P converter.

Args:

language: Language code (default: ‘cs-cz’). use_espeak_fallback: Whether to use espeak for OOV words. use_goruut_fallback: Whether to use goruut for OOV words. unk: Character to use for unknown characters. load_silver: If True, load silver tier dictionary if available.

Currently Czech uses rule-based G2P, so this parameter is reserved for future use and consistency. Defaults to True for consistency.

load_gold: If True, load gold tier dictionary if available.

Currently Czech uses rule-based G2P, so this parameter is reserved for future use and consistency. Defaults to True for consistency.

expand_abbreviations: If True, expand common abbreviations

(e.g., “Dr.” → “Doktor”). Defaults to True.

enable_context_detection: If True, use context-aware expansion

for ambiguous abbreviations. Defaults to True.

Raises:

ValueError: If both use_espeak_fallback and use_goruut_fallback are True.

__call__(text: str) list[GToken][source]

Convert text to a list of tokens with phonemes.

Args:

text: Input text to convert.

Returns:

List of GToken objects with phonemes assigned.

lookup(word: str, tag: str | None = None) str | None[source]

Look up a word in the dictionary.

For Czech, this just converts the word to phonemes using rules.

Args:

word: The word to look up. tag: Optional POS tag (not used for Czech).

Returns:

Phoneme string.

get_target_model() str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:

Model identifier: version string (“1.1” or “1.0”).

Examples

from kokorog2p.cs import CzechG2P

g2p = CzechG2P(language="cs-cz")
tokens = g2p("Dobrý den, jak se máte?")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")

Phonological Rules

Czech G2P implements the following phonological rules:

  • Palatalization: d+i → ɟ, t+i → c, n+i → ɲ

  • Long vowels: á → aː, í → iː, ú/ů → uː, é → eː, ó → oː

  • ř phoneme: Special raised alveolar trill [r̝]

  • CH digraph: ch → [x]

  • Final devoicing: Voiced consonants become voiceless at word end

  • Voicing assimilation: Consonant clusters assimilate in voicing