German API
German G2P provides phoneme conversion using a large 738k+ entry dictionary with rule-based fallback.
Main Class
- class kokorog2p.de.GermanG2P(language: str = 'de-de', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'de_core_news_sm', use_lexicon: bool = True, strip_stress: bool = True, load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any)[source]
Bases:
G2PBaseGerman G2P converter using dictionary lookup with fallback options.
This class provides grapheme-to-phoneme conversion for German text using a large dictionary (738k+ entries) with fallback to espeak-ng or goruut for out-of-vocabulary words and phonological rules.
- Example:
>>> g2p = GermanG2P() >>> tokens = g2p("Guten Tag") >>> for token in tokens: ... print(f"{token.text} -> {token.phonemes}")
- __init__(language: str = 'de-de', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'de_core_news_sm', use_lexicon: bool = True, strip_stress: bool = True, load_silver: bool = True, load_gold: bool = True, version: str = '1.0', expand_abbreviations: bool = True, enable_context_detection: bool = True, **kwargs: Any) None[source]
Initialize the German G2P converter.
- Args:
language: Language code (default: ‘de-de’). use_espeak_fallback: Whether to use espeak for OOV words. use_goruut_fallback: Whether to use goruut for OOV words. use_spacy: Whether to use spaCy for tokenization and POS tagging.
Defaults to False to preserve legacy behavior and avoid requiring spaCy model downloads unless explicitly requested.
- spacy_model: spaCy German model package to load when use_spacy=True
(e.g., “de_core_news_sm”, “de_core_news_md”, “de_core_news_lg”).
use_lexicon: Whether to use dictionary lookup (default: True). strip_stress: Whether to remove stress markers from lexicon output. load_silver: If True, load silver tier dictionary if available.
Currently German only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.
- load_gold: If True, load gold tier dictionary.
Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.
expand_abbreviations: Whether to expand abbreviations (Prof. → Professor). enable_context_detection: Context-aware abbreviation expansion.
- Raises:
ValueError: If both use_espeak_fallback and use_goruut_fallback are True.
- property regex_tokenizer: RegexTokenizer
Lazily initialize the regex tokenizer.
- property spacy_tokenizer: SpacyTokenizer
Lazily initialize the spaCy tokenizer.
- __call__(text: str) list[GToken][source]
Convert text to a list of tokens with phonemes.
- Args:
text: Input text to convert.
- Returns:
List of GToken objects with phonemes assigned.
- lookup(word: str, tag: str | None = None) str | None[source]
Look up a word in the dictionary.
- Args:
word: The word to look up. tag: Optional POS tag (not used for German).
- Returns:
Phoneme string if found, None otherwise.
Lexicon
- class kokorog2p.de.GermanLexicon(strip_stress: bool = False, load_silver: bool = True, load_gold: bool = True)[source]
Bases:
objectGerman pronunciation lexicon.
Uses a gold dictionary for lookup with optional fallback.
- Example:
>>> lexicon = GermanLexicon() >>> lexicon.lookup("Haus") 'haʊ̯s'
- __init__(strip_stress: bool = False, load_silver: bool = True, load_gold: bool = True) None[source]
Initialize the German lexicon.
- Args:
strip_stress: If True, remove stress markers from phonemes. load_silver: If True, load silver tier dictionary if available.
Currently German only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.
- load_gold: If True, load gold tier dictionary.
Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.
- lookup(word: str, tag: str | None = None) str | None[source]
Look up a word in the lexicon.
- Args:
word: The word to look up. tag: Optional POS tag (not used for German).
- Returns:
IPA phoneme string if found, None otherwise.
- __call__(word: str, tag: str | None = None) str | None[source]
Look up a word in the lexicon.
- Args:
word: The word to look up. tag: Optional POS tag.
- Returns:
IPA phoneme string if found, None otherwise.
Number Conversion
- class kokorog2p.de.numbers.GermanNumberConverter(lookup_fn: Callable[[str, str | None], str | None] | None = None)[source]
Bases:
objectConvert numbers to their German word representations.
This class handles various number formats including: - Cardinal numbers (1, 2, 3 -> eins, zwei, drei) - Ordinal numbers (1., 2. -> erste, zweite) - Years (1984 -> neunzehnhundertvierundachtzig) - Decimals (3,14 -> drei Komma eins vier) - Currency (12,50€ -> zwölf Euro fünfzig)
- __init__(lookup_fn: Callable[[str, str | None], str | None] | None = None) None[source]
Initialize the German number converter.
- Args:
lookup_fn: Optional function to look up words in the lexicon.
- convert_cardinal(word: str) str[source]
Convert cardinal number to German words.
- Args:
word: Number string (e.g., “42”, “1.000”).
- Returns:
German word representation.
- convert_ordinal(word: str) str[source]
Convert ordinal number to German words.
- Args:
word: Number string (e.g., “1”, “42”).
- Returns:
German ordinal word representation.
- convert_year(word: str) str[source]
Convert year to German words.
- Args:
word: Year string (e.g., “1984”, “2024”).
- Returns:
German year word representation.
- convert_decimal(word: str) str[source]
Convert decimal number to German words.
German uses comma as decimal separator.
- Args:
word: Decimal string (e.g., “3,14” or “3.14”).
- Returns:
German word representation.
- convert_currency(word: str, currency: str) str[source]
Convert currency amount to German words.
- Args:
word: Amount string (e.g., “12,50”). currency: Currency symbol (e.g., “€”).
- Returns:
German currency word representation.
- convert(word: str, currency: str | None = None, is_ordinal: bool = False, is_year: bool = False) str[source]
Convert a number to its German word representation.
- Args:
word: The number string to convert. currency: Optional currency symbol (e.g., ‘€’). is_ordinal: Whether to convert as ordinal. is_year: Whether to convert as year.
- Returns:
German word representation.
- kokorog2p.de.numbers.expand_number(text: str) str[source]
Expand numbers in text to German words.
This is a convenience function for simple number expansion.
- Args:
text: Text potentially containing numbers.
- Returns:
Text with numbers expanded to German words.
Examples
from kokorog2p.de import GermanG2P
g2p = GermanG2P(language="de-de")
tokens = g2p("Guten Tag, wie geht es Ihnen?")
for token in tokens:
print(f"{token.text} -> {token.phonemes}")