Portuguese API
Portuguese G2P provides rule-based phoneme conversion for Brazilian Portuguese, designed for Kokoro TTS models.
Main Class
- class kokorog2p.pt.PortugueseG2P(language: str = 'pt-br', use_espeak_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'pt_core_news_sm', mark_stress: bool = True, affricate_ti_di: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, dialect: str = 'br', version: str = '1.0', **kwargs: Any)[source]
Bases:
G2PBaseBrazilian Portuguese G2P converter using rule-based phonemization.
This class provides grapheme-to-phoneme conversion for Brazilian Portuguese text using Portuguese orthographic rules.
- Example:
>>> g2p = PortugueseG2P() >>> tokens = g2p("Olá, como está?") >>> for token in tokens: ... print(f"{token.text} -> {token.phonemes}")
- __init__(language: str = 'pt-br', use_espeak_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'pt_core_news_sm', mark_stress: bool = True, affricate_ti_di: bool = True, expand_abbreviations: bool = True, enable_context_detection: bool = True, dialect: str = 'br', version: str = '1.0', **kwargs: Any) None[source]
Initialize the Portuguese G2P converter.
- Args:
language: Language code (default: ‘pt-br’). use_espeak_fallback: Reserved for future espeak integration. use_spacy: Whether to use spaCy for tokenization and POS tagging.
Defaults to False to preserve existing behavior.
- spacy_model: spaCy Portuguese model package to load when use_spacy=True
(e.g., “pt_core_news_sm”, “pt_core_news_md”, “pt_core_news_lg”).
mark_stress: Whether to mark primary stress with ˈ. affricate_ti_di: Whether to affricate /t d/ before /i/ (Brazilian feature). expand_abbreviations: Whether to expand common abbreviations. enable_context_detection: Context-aware abbreviation expansion. dialect: “br” for Brazilian, “pt” for European Portuguese.
Affects number pronunciation (dezesseis vs dezasseis)
version: Target model version.
- __call__(text: str) list[GToken][source]
Convert text to a list of tokens with phonemes.
- Args:
text: Input text to convert.
- Returns:
List of GToken objects with phonemes assigned.
- property spacy_tokenizer: SpacyTokenizer
Lazily initialize the spaCy tokenizer.
- lookup(word: str, tag: str | None = None) str | None[source]
Look up a word’s phonemes.
- Args:
word: The word to look up. tag: Optional POS tag (ignored for Portuguese).
- Returns:
Phoneme string or None.
Examples
from kokorog2p.pt import PortugueseG2P
g2p = PortugueseG2P(language="pt-br")
tokens = g2p("Olá mundo!")
for token in tokens:
print(f"{token.text} -> {token.phonemes}")
Phonology Features
Brazilian Portuguese phonology includes:
7 oral vowels (a, e, ɛ, i, o, ɔ, u) with open/closed e/o variants
5 nasal vowels (ã, ẽ, ĩ, õ, ũ)
Nasal diphthongs (ãw̃, õj̃, etc.)
Palatalization: lh [ʎ], nh [ɲ], x/ch [ʃ]
Affrication: t+i [ʧ], d+i [ʤ] (Brazilian Portuguese feature)
Sibilants: s [s/z], x [ʃ], z [z]
Liquids: r [ʁ/x/h] (varies by dialect), rr [ʁ/x], single r [ɾ]
No θ sound (unlike European Portuguese)