Hebrew API
Hebrew G2P provides phoneme conversion using the phonikud package for handling Hebrew text with diacritics (nikud).
Main Class
- class kokorog2p.he.HebrewG2P(language: str = 'he', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, load_silver: bool = True, load_gold: bool = True, preserve_punctuation: bool = True, preserve_stress: bool = True, version: str = '1.0', **kwargs: Any)[source]
Bases:
G2PBaseHebrew G2P using phonikud for phonemization.
This class converts Hebrew text to phonemes using the phonikud package. Hebrew text is expected to be with enhanced diacritics (nikud) for accurate phonemization.
- Example:
>>> g2p = HebrewG2P() >>> tokens = g2p("שָׁלוֹם") # "shalom" with nikud >>> for token in tokens: ... print(f"{token.text} -> {token.phonemes}")
- __init__(language: str = 'he', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, load_silver: bool = True, load_gold: bool = True, preserve_punctuation: bool = True, preserve_stress: bool = True, version: str = '1.0', **kwargs: Any) None[source]
Initialize the Hebrew G2P.
- Args:
language: Language code (e.g., ‘he’, ‘he-il’, ‘heb’, ‘hebrew’). use_espeak_fallback: Whether to use espeak for unknown words.
Not typically used for Hebrew. Defaults to False.
- use_goruut_fallback: Whether to use goruut for unknown words.
Not typically used for Hebrew. Defaults to False.
- load_silver: Reserved for API consistency. Hebrew doesn’t use
dictionary tiers. Defaults to True.
- load_gold: Reserved for API consistency. Hebrew doesn’t use
dictionary tiers. Defaults to True.
- preserve_punctuation: Whether to preserve punctuation in output.
Defaults to True.
- preserve_stress: Whether to preserve stress markers in output.
Defaults to True.
**kwargs: Additional arguments passed to phonikud.phonemize().
- property phonikud
Lazy initialization of phonikud backend.
- __call__(text: str) list[GToken][source]
Convert Hebrew text to tokens with phonemes.
- Args:
text: Input Hebrew text to convert (preferably with nikud).
- Returns:
List of GToken objects with phonemes.
Examples
from kokorog2p.he import HebrewG2P
g2p = HebrewG2P(language="he-il")
tokens = g2p("שלום עולם!")
for token in tokens:
print(f"{token.text} -> {token.phonemes}")
Implementation
The Hebrew G2P implementation uses the phonikud package which:
Handles Hebrew text with diacritics (nikud)
Converts Hebrew to IPA phoneme representation
Supports both modern and biblical Hebrew
Reference: https://github.com/thewh1teagle/phonikud