Italian API

Italian G2P provides rule-based phoneme conversion for Italian, designed for Kokoro TTS models.

Main Class

class kokorog2p.it.ItalianG2P(language: str = 'it-it', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'it_core_news_sm', mark_stress: bool = True, mark_gemination: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, version: str = '1.0', **kwargs: Any)[source]

Bases: G2PBase

Italian G2P converter using rule-based phonemization.

This class provides grapheme-to-phoneme conversion for Italian text using Italian orthographic rules. Italian has fairly regular spelling, making rule-based conversion quite accurate.

Example:

>>> g2p = ItalianG2P()
>>> tokens = g2p("Ciao, come stai?")
>>> for token in tokens:
...     print(f"{token.text} -> {token.phonemes}")

__init__(language: str = 'it-it', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'it_core_news_sm', mark_stress: bool = True, mark_gemination: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, version: str = '1.0', **kwargs: Any) → None[source]

Initialize the Italian G2P converter.

Args:

language: Language code (default: ‘it-it’). use_espeak_fallback: Reserved for future espeak integration. use_spacy: Whether to use spaCy for tokenization and POS tagging.

Defaults to False to preserve existing behavior.

spacy_model: spaCy Italian model package to load when use_spacy=True: (e.g., “it_core_news_sm”, “it_core_news_md”, “it_core_news_lg”).

mark_stress: Whether to mark primary stress with ˈ. mark_gemination: Whether to mark double consonants with ː. expand_abbreviations: Whether to expand common abbreviations. enable_context_detection: Context-aware abbreviation expansion. version: Target model version.

__call__(text: str) → list[GToken][source]

Convert text to a list of tokens with phonemes.

Args:: text: Input text to convert.
Returns:: List of GToken objects with phonemes assigned.

property nlp: object: Lazily initialize spaCy.

property spacy_tokenizer: SpacyTokenizer: Lazily initialize the spaCy tokenizer.

lookup(word: str, tag: str | None = None) → str | None[source]

Look up a word’s phonemes.

Args:: word: The word to look up. tag: Optional POS tag (ignored for Italian).
Returns:: Phoneme string or None.

phonemize(text: str) → str[source]

Convert text to phonemes.

Args:: text: Input text to convert.
Returns:: Phoneme string.

get_target_model() → str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:: Model identifier: version string (“1.1” or “1.0”).

Examples

from kokorog2p.it import ItalianG2P

g2p = ItalianG2P(language="it-it")
tokens = g2p("Ciao mondo!")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")

Phonology Features

Italian phonology includes:

5 pure vowels (a, e, i, o, u) - always pronounced clearly
No vowel reduction (unlike English)
Predictable stress (usually penultimate syllable)
Gemination (double consonants) is phonemically distinctive
Palatals: gn [ɲ], gli [ʎ]
Affricates: z [ʦ/ʣ], c/ci [ʧ], g/gi [ʤ]
No diphthongs in standard Italian (consecutive vowels are separate syllables)