French API

French G2P provides phoneme conversion using a gold dictionary with espeak-ng fallback.

Main Class

class kokorog2p.fr.FrenchG2P(language: str = 'fr-fr', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = True, spacy_model: str = 'fr_core_news_sm', expand_nums: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs)[source]

Bases: G2PBase

French G2P converter using dictionary lookup with fallback options.

This class provides grapheme-to-phoneme conversion for French text, using a gold dictionary with espeak-ng or goruut as fallback for OOV words.

Example:

>>> g2p = FrenchG2P()
>>> tokens = g2p("Bonjour, comment allez-vous?")
>>> for token in tokens:
...     print(f"{token.text} -> {token.phonemes}")

__init__(language: str = 'fr-fr', use_espeak_fallback: bool = True, use_goruut_fallback: bool = False, use_spacy: bool = True, spacy_model: str = 'fr_core_news_sm', expand_nums: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, unk: str = '?', load_silver: bool = True, load_gold: bool = True, version: str = '1.0', **kwargs) → None[source]

Initialize the French G2P converter.

Args:

language: Language code (default: ‘fr-fr’). use_espeak_fallback: Whether to use espeak for OOV words. use_goruut_fallback: Whether to use goruut for OOV words. use_spacy: Whether to use spaCy for tokenization and POS tagging. spacy_model: spaCy French model package to load when use_spacy=True

(e.g., “fr_core_news_sm”, “fr_core_news_md”, “fr_core_news_lg”).

expand_nums: Whether to expand numbers to words. expand_abbreviations: Whether to expand common abbreviations. enable_context_detection: Context-aware abbreviation expansion. unk: Character to use for unknown words when fallback is disabled. load_silver: If True, load silver tier dictionary if available.

Currently French only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.

load_gold: If True, load gold tier dictionary.: Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.

Raises:

ValueError: If both use_espeak_fallback and use_goruut_fallback are True.

property fallback: FrenchFallback | FrenchGoruutFallback | None: Lazily initialize the appropriate fallback.

property nlp: object: Lazily initialize spaCy.

property regex_tokenizer: RegexTokenizer: Lazily initialize the regex tokenizer.

property spacy_tokenizer: SpacyTokenizer: Lazily initialize the spaCy tokenizer.

__call__(text: str) → list[GToken][source]

Convert text to a list of tokens with phonemes.

Args:: text: Input text to convert.
Returns:: List of GToken objects with phonemes assigned.

lookup(word: str, tag: str | None = None) → str | None[source]

Look up a word in the dictionary.

Args:: word: The word to look up. tag: Optional POS tag for disambiguation.
Returns:: Phoneme string or None if not found.

get_target_model() → str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:: Model identifier: version string (“1.1” or “1.0”).

Lexicon

class kokorog2p.fr.FrenchLexicon(load_silver: bool = True, load_gold: bool = True)[source]

Bases: object

Dictionary-based G2P lookup for French with gold dictionary.

__init__(load_silver: bool = True, load_gold: bool = True) → None[source]

Initialize the French lexicon.

Args:

load_silver: If True, load silver tier dictionary if available.: Currently French only has gold dictionary, so this parameter is reserved for future use and consistency with English. Defaults to True for consistency.
load_gold: If True, load gold tier dictionary.: Defaults to True for maximum quality and coverage. Set to False when ultra-fast initialization is needed.

is_known(word: str, tag: str | None = None) → bool[source]: Check if a word is in the lexicon.

lookup(word: str, tag: str | None = None, ctx: TokenContext | None = None) → tuple[str | None, int | None][source]

Look up a word in the lexicon.

Args:: word: Word to look up. tag: POS tag (optional). ctx: Token context (optional).
Returns:: Tuple of (phonemes, rating) or (None, None) if not found.

expand_abbreviation(text: str) → str[source]: Expand common French abbreviations.

expand_ordinals(text: str) → str[source]: Expand ordinal numbers.

get_special_case(word: str, tag: str | None, ctx: TokenContext | None) → tuple[str | None, int | None][source]: Handle special case words with context-dependent pronunciations.

static normalize_word(word: str) → str[source]: Normalize a word for lookup.

__call__(word: str, tag: str | None = None, ctx: TokenContext | None = None) → tuple[str | None, int | None][source]

Look up phonemes for a word.

Args:: word: Word to look up. tag: POS tag. ctx: Token context.
Returns:: Tuple of (phonemes, rating) or (None, None) if not found.

Number Conversion

Helper Functions

kokorog2p.fr.numbers.number_to_french(n: int, ordinal: bool = False) → str[source]

Convert a number to French words using num2words.

Args:

n: Integer to convert. ordinal: If True, return ordinal form (premier, deuxième, etc.)

Returns:

French word representation.

Raises:

ImportError: If num2words is not installed.

Example:

>>> number_to_french(42)
'quarante-deux'
>>> number_to_french(1, ordinal=True)
'premier'

kokorog2p.fr.numbers.expand_numbers(text: str, max_value: int = 1000000) → str[source]

Expand numbers in text to French words.

Args:

text: Text containing numbers. max_value: Maximum value to expand (larger numbers kept as-is).

Returns:

Text with numbers expanded.

Example:

>>> expand_numbers("J'ai 3 pommes et 42 oranges.")
"J'ai trois pommes et quarante-deux oranges."

kokorog2p.fr.numbers.expand_time(text: str) → str[source]

Expand time expressions like 14h30.

Args:

text: Text containing time expressions.

Returns:

Text with times expanded.

Example:

>>> expand_time("Le rendez-vous est à 14h30.")
'Le rendez-vous est à quatorze heures trente.'

kokorog2p.fr.numbers.expand_currency(text: str) → str[source]

Expand currency amounts.

Args:

text: Text containing currency amounts.

Returns:

Text with currency expanded.

Example:

>>> expand_currency("Ça coûte 5€.")
'Ça coûte cinq euros.'

kokorog2p.fr.numbers.expand_ordinal(text: str) → str[source]

Expand ordinal numbers like 1er, 2ème, etc.

Args:

text: Text containing ordinal numbers.

Returns:

Text with ordinals expanded.

Example:

>>> expand_ordinal("Le 1er janvier")
'Le premier janvier'

kokorog2p.fr.numbers.is_available() → bool[source]

Check if num2words is available.

Returns:: True if num2words is installed.

Examples

from kokorog2p.fr import FrenchG2P

g2p = FrenchG2P(language="fr-fr")
tokens = g2p("Bonjour le monde!")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")