French API
French G2P provides phoneme conversion using a gold dictionary with espeak-ng fallback.
Main Class
- class kokorog2p.fr.FrenchG2P(language: str = 'fr-fr', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = True, spacy_model: str = 'fr_core_news_sm', expand_nums: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs)[source]
Bases:
G2PBaseFrench G2P converter using dictionary lookup with fallback options.
This class provides grapheme-to-phoneme conversion for French text, using a gold dictionary with espeak-ng or goruut as fallback for OOV words.
- Example:
>>> g2p = FrenchG2P() >>> tokens = g2p("Bonjour, comment allez-vous?") >>> for token in tokens: ... print(f"{token.text} -> {token.phonemes}")
- __init__(language: str = 'fr-fr', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = True, spacy_model: str = 'fr_core_news_sm', expand_nums: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs) None[source]
Initialize the French G2P converter.
- Args:
language: Language code (default: ‘fr-fr’). use_espeak_fallback: Whether to use espeak for OOV words. use_goruut_fallback: Whether to use goruut for OOV words. use_spacy: Whether to use spaCy for tokenization and POS tagging. spacy_model: spaCy French model package to load when use_spacy=True
(e.g., “fr_core_news_sm”, “fr_core_news_md”, “fr_core_news_lg”).
expand_nums: Whether to expand numbers to words. expand_abbreviations: Whether to expand common abbreviations. enable_context_detection: Context-aware abbreviation expansion. unk: Character to use for unknown words when fallback is disabled. load_silver: If True, load silver tier dictionary if available.
Currently French only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.
- load_gold: If True, load gold tier dictionary.
Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.
- Raises:
ValueError: If both use_espeak_fallback and use_goruut_fallback are True.
- property fallback: FrenchFallback | FrenchGoruutFallback | None
Lazily initialize the appropriate fallback.
- property regex_tokenizer: RegexTokenizer
Lazily initialize the regex tokenizer.
- property spacy_tokenizer: SpacyTokenizer
Lazily initialize the spaCy tokenizer.
- __call__(text: str) list[GToken][source]
Convert text to a list of tokens with phonemes.
- Args:
text: Input text to convert.
- Returns:
List of GToken objects with phonemes assigned.
Lexicon
- class kokorog2p.fr.FrenchLexicon(load_silver: bool = True, load_gold: bool = True)[source]
Bases:
objectDictionary-based G2P lookup for French with gold dictionary.
- __init__(load_silver: bool = True, load_gold: bool = True) None[source]
Initialize the French lexicon.
- Args:
- load_silver: If True, load silver tier dictionary if available.
Currently French only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.
- load_gold: If True, load gold tier dictionary.
Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.
- lookup(word: str, tag: str | None = None, ctx: TokenContext | None = None) tuple[str | None, int | None][source]
Look up a word in the lexicon.
- Args:
word: Word to look up. tag: POS tag (optional). ctx: Token context (optional).
- Returns:
Tuple of (phonemes, rating) or (None, None) if not found.
Number Conversion
Helper Functions
- kokorog2p.fr.numbers.number_to_french(n: int, ordinal: bool = False) str[source]
Convert a number to French words using num2words.
- Args:
n: Integer to convert. ordinal: If True, return ordinal form (premier, deuxième, etc.)
- Returns:
French word representation.
- Raises:
ImportError: If num2words is not installed.
- Example:
>>> number_to_french(42) 'quarante-deux' >>> number_to_french(1, ordinal=True) 'premier'
- kokorog2p.fr.numbers.expand_numbers(text: str, max_value: int = 1000000) str[source]
Expand numbers in text to French words.
- Args:
text: Text containing numbers. max_value: Maximum value to expand (larger numbers kept as-is).
- Returns:
Text with numbers expanded.
- Example:
>>> expand_numbers("J'ai 3 pommes et 42 oranges.") "J'ai trois pommes et quarante-deux oranges."
- kokorog2p.fr.numbers.expand_time(text: str) str[source]
Expand time expressions like 14h30.
- Args:
text: Text containing time expressions.
- Returns:
Text with times expanded.
- Example:
>>> expand_time("Le rendez-vous est à 14h30.") 'Le rendez-vous est à quatorze heures trente.'
- kokorog2p.fr.numbers.expand_currency(text: str) str[source]
Expand currency amounts.
- Args:
text: Text containing currency amounts.
- Returns:
Text with currency expanded.
- Example:
>>> expand_currency("Ça coûte 5€.") 'Ça coûte cinq euros.'
Examples
from kokorog2p.fr import FrenchG2P
g2p = FrenchG2P(language="fr-fr")
tokens = g2p("Bonjour le monde!")
for token in tokens:
print(f"{token.text} -> {token.phonemes}")