German API

German G2P provides phoneme conversion using a large 738k+ entry dictionary with rule-based fallback.

Main Class

class kokorog2p.de.GermanG2P(language: str = 'de-de', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'de_core_news_sm', use_lexicon: bool = True, strip_stress: bool = True, load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any)[source]

Bases: G2PBase

German G2P converter using dictionary lookup with fallback options.

This class provides grapheme-to-phoneme conversion for German text using a large dictionary (738k+ entries) with fallback to espeak-ng or goruut for out-of-vocabulary words and phonological rules.

Example:

>>> g2p = GermanG2P()
>>> tokens = g2p("Guten Tag")
>>> for token in tokens:
...     print(f"{token.text} -> {token.phonemes}")

__init__(language: str = 'de-de', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'de_core_news_sm', use_lexicon: bool = True, strip_stress: bool = True, load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any) → None[source]

Initialize the German G2P converter.

Args:

language: Language code (default: ‘de-de’). use_espeak_fallback: Whether to use espeak for OOV words. use_goruut_fallback: Whether to use goruut for OOV words. use_spacy: Whether to use spaCy for tokenization and POS tagging.

Defaults to False to preserve legacy behavior and avoid requiring spaCy model downloads unless explicitly requested.

spacy_model: spaCy German model package to load when use_spacy=True: (e.g., “de_core_news_sm”, “de_core_news_md”, “de_core_news_lg”).

use_lexicon: Whether to use dictionary lookup (default: True). strip_stress: Whether to remove stress markers from lexicon output. load_silver: If True, load silver tier dictionary if available.

Currently German only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.

load_gold: If True, load gold tier dictionary.: Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.

expand_abbreviations: Whether to expand abbreviations (Prof. → Professor). enable_context_detection: Context-aware abbreviation expansion.

Raises:

ValueError: If both use_espeak_fallback and use_goruut_fallback are True.

property nlp: object: Lazily initialize spaCy.

property regex_tokenizer: RegexTokenizer: Lazily initialize the regex tokenizer.

property spacy_tokenizer: SpacyTokenizer: Lazily initialize the spaCy tokenizer.

__call__(text: str) → list[GToken][source]

Convert text to a list of tokens with phonemes.

Args:: text: Input text to convert.
Returns:: List of GToken objects with phonemes assigned.

lookup(word: str, tag: str | None = None) → str | None[source]

Look up a word in the dictionary.

Args:: word: The word to look up. tag: Optional POS tag (not used for German).
Returns:: Phoneme string if found, None otherwise.

phonemize(text: str) → str[source]

Convert text to a phoneme string.

Args:: text: Input text to convert.
Returns:: Phoneme string.

get_target_model() → str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:: Model identifier: version string (“1.1” or “1.0”).

Lexicon

class kokorog2p.de.GermanLexicon(strip_stress: bool = False, load_silver: bool = True, load_gold: bool = True)[source]

Bases: object

German pronunciation lexicon.

Uses a gold dictionary for lookup with optional fallback.

Example:

>>> lexicon = GermanLexicon()
>>> lexicon.lookup("Haus")
'haʊ̯s'

__init__(strip_stress: bool = False, load_silver: bool = True, load_gold: bool = True) → None[source]

Initialize the German lexicon.

Args:

strip_stress: If True, remove stress markers from phonemes. load_silver: If True, load silver tier dictionary if available.

Currently German only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.

load_gold: If True, load gold tier dictionary.: Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.

lookup(word: str, tag: str | None = None) → str | None[source]

Look up a word in the lexicon.

Args:: word: The word to look up. tag: Optional POS tag (not used for German).
Returns:: IPA phoneme string if found, None otherwise.

__call__(word: str, tag: str | None = None) → str | None[source]

Look up a word in the lexicon.

Args:: word: The word to look up. tag: Optional POS tag.
Returns:: IPA phoneme string if found, None otherwise.

is_known(word: str) → bool[source]

Check if a word is in the lexicon.

Args:: word: The word to check.
Returns:: True if the word is in the lexicon.

__len__() → int[source]: Return the number of entries in the lexicon.

__repr__() → str[source]: Return string representation.

Number Conversion

class kokorog2p.de.numbers.GermanNumberConverter(lookup_fn: Callable[[str, str | None], str | None] | None = None)[source]

Bases: object

Convert numbers to their German word representations.

This class handles various number formats including: - Cardinal numbers (1, 2, 3 -> eins, zwei, drei) - Ordinal numbers (1., 2. -> erste, zweite) - Years (1984 -> neunzehnhundertvierundachtzig) - Decimals (3,14 -> drei Komma eins vier) - Currency (12,50€ -> zwölf Euro fünfzig)

__init__(lookup_fn: Callable[[str, str | None], str | None] | None = None) → None[source]

Initialize the German number converter.

Args:: lookup_fn: Optional function to look up words in the lexicon.

property num2words: Callable: Lazily import num2words with German language.

convert_cardinal(word: str) → str[source]

Convert cardinal number to German words.

Args:: word: Number string (e.g., “42”, “1.000”).
Returns:: German word representation.

convert_ordinal(word: str) → str[source]

Convert ordinal number to German words.

Args:: word: Number string (e.g., “1”, “42”).
Returns:: German ordinal word representation.

convert_year(word: str) → str[source]

Convert year to German words.

Args:: word: Year string (e.g., “1984”, “2024”).
Returns:: German year word representation.

convert_decimal(word: str) → str[source]

Convert decimal number to German words.

German uses comma as decimal separator.

Args:: word: Decimal string (e.g., “3,14” or “3.14”).
Returns:: German word representation.

convert_currency(word: str, currency: str) → str[source]

Convert currency amount to German words.

Args:: word: Amount string (e.g., “12,50”). currency: Currency symbol (e.g., “€”).
Returns:: German currency word representation.

convert(word: str, currency: str | None = None, is_ordinal: bool = False, is_year: bool = False) → str[source]

Convert a number to its German word representation.

Args:: word: The number string to convert. currency: Optional currency symbol (e.g., ‘€’). is_ordinal: Whether to convert as ordinal. is_year: Whether to convert as year.
Returns:: German word representation.

kokorog2p.de.numbers.expand_number(text: str) → str[source]

Expand numbers in text to German words.

This is a convenience function for simple number expansion.

Args:: text: Text potentially containing numbers.
Returns:: Text with numbers expanded to German words.

kokorog2p.de.numbers.number_to_german(n: int) → str[source]

Convert an integer to German words.

This is a fallback when num2words is not available.

Args:: n: Integer to convert.
Returns:: German word representation.

kokorog2p.de.numbers.ordinal_to_german(n: int) → str[source]

Convert an integer to German ordinal words.

Args:: n: Integer to convert.
Returns:: German ordinal word representation.

Examples

from kokorog2p.de import GermanG2P

g2p = GermanG2P(language="de-de")
tokens = g2p("Guten Tag, wie geht es Ihnen?")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")