Spanish API
Spanish G2P provides rule-based phoneme conversion for Spanish, designed for Kokoro TTS models.
Main Class
- class kokorog2p.es.SpanishG2P(language: str = 'es', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'es_core_news_sm', mark_stress: bool = True, dialect: str = 'es', expand_abbreviations: bool = True, enable_context_detection: bool = True, version: str = '1.0', **kwargs: Any)[source]
Bases:
G2PBaseSpanish G2P converter using rule-based phonemization.
This class provides grapheme-to-phoneme conversion for Spanish text using Spanish orthographic rules. Spanish has fairly regular spelling, making rule-based conversion quite accurate.
- Example:
>>> g2p = SpanishG2P() >>> tokens = g2p("Hola, ¿cómo estás?") >>> for token in tokens: ... print(f"{token.text} -> {token.phonemes}")
- __init__(language: str = 'es', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'es_core_news_sm', mark_stress: bool = True, dialect: str = 'es', expand_abbreviations: bool = True, enable_context_detection: bool = True, version: str = '1.0', **kwargs: Any) None[source]
Initialize the Spanish G2P converter.
- Args:
language: Language code (default: ‘es’). use_espeak_fallback: Reserved for future espeak integration. use_goruut_fallback: Reserved for future goruut integration. use_spacy: Whether to use spaCy for tokenization and POS tagging.
Defaults to False to preserve existing behavior.
- spacy_model: spaCy Spanish model package to load when use_spacy=True
(e.g., “es_core_news_sm”, “es_core_news_md”, “es_core_news_lg”).
mark_stress: Whether to mark primary stress with ˈ. dialect: “es” for European Spanish (with θ), “la” for Latin American (θ→s). expand_abbreviations: Whether to expand common abbreviations. enable_context_detection: Context-aware abbreviation expansion. version: Target model version.
- __call__(text: str) list[GToken][source]
Convert text to a list of tokens with phonemes.
- Args:
text: Input text to convert.
- Returns:
List of GToken objects with phonemes assigned.
- property spacy_tokenizer: SpacyTokenizer
Lazily initialize the spaCy tokenizer.
- lookup(word: str, tag: str | None = None) str | None[source]
Look up a word’s phonemes.
- Args:
word: The word to look up. tag: Optional POS tag (ignored for Spanish).
- Returns:
Phoneme string or None.
Examples
from kokorog2p.es import SpanishG2P
g2p = SpanishG2P(language="es-es")
tokens = g2p("¡Hola mundo!")
for token in tokens:
print(f"{token.text} -> {token.phonemes}")
Phonology Features
Spanish phonology includes:
5 pure vowels (a, e, i, o, u) - always pronounced clearly
No vowel reduction (unlike English)
Predictable stress (penultimate for vowel-ending words, final for consonant-ending)
Palatal sounds: ñ [ɲ], ll [ʎ] (or [j] in most dialects), ch [ʧ]
Jota: j/g+e/i [x]
Theta: z/c+e/i [θ] in European Spanish (or [s] in Latin America)
Tap vs trill: r [ɾ] vs rr/initial r [r]
No consonant clusters simplification