Hebrew API

Hebrew G2P provides phoneme conversion using the phonikud package for handling Hebrew text with diacritics (nikud).

Main Class

class kokorog2p.he.HebrewG2P(language: str = 'he', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, load_silver: bool = True, load_gold: bool = True, preserve_punctuation: bool = True, preserve_stress: bool = True, version: str = '1.0', **kwargs: Any)[source]

Bases: G2PBase

Hebrew G2P using phonikud for phonemization.

This class converts Hebrew text to phonemes using the phonikud package. Hebrew text is expected to be with enhanced diacritics (nikud) for accurate phonemization.

Example:
>>> g2p = HebrewG2P()
>>> tokens = g2p("שָׁלוֹם")  # "shalom" with nikud
>>> for token in tokens:
...     print(f"{token.text} -> {token.phonemes}")
__init__(language: str = 'he', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, load_silver: bool = True, load_gold: bool = True, preserve_punctuation: bool = True, preserve_stress: bool = True, version: str = '1.0', **kwargs: Any) None[source]

Initialize the Hebrew G2P.

Args:

language: Language code (e.g., ‘he’, ‘he-il’, ‘heb’, ‘hebrew’). use_espeak_fallback: Whether to use espeak for unknown words.

Not typically used for Hebrew. Defaults to False.

use_goruut_fallback: Whether to use goruut for unknown words.

Not typically used for Hebrew. Defaults to False.

load_silver: Reserved for API consistency. Hebrew doesn’t use

dictionary tiers. Defaults to True.

load_gold: Reserved for API consistency. Hebrew doesn’t use

dictionary tiers. Defaults to True.

preserve_punctuation: Whether to preserve punctuation in output.

Defaults to True.

preserve_stress: Whether to preserve stress markers in output.

Defaults to True.

**kwargs: Additional arguments passed to phonikud.phonemize().

property phonikud

Lazy initialization of phonikud backend.

__call__(text: str) list[GToken][source]

Convert Hebrew text to tokens with phonemes.

Args:

text: Input Hebrew text to convert (preferably with nikud).

Returns:

List of GToken objects with phonemes.

lookup(word: str, tag: str | None = None) str | None[source]

Look up a Hebrew word and return its phonetic representation.

Args:

word: The word to look up (preferably with nikud). tag: Optional POS tag (not used in Hebrew G2P).

Returns:

Phoneme string or None if phonikud is not available.

get_target_model() str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:

Model identifier: version string (“1.1” or “1.0”).

Examples

from kokorog2p.he import HebrewG2P

g2p = HebrewG2P(language="he-il")
tokens = g2p("שלום עולם!")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")

Implementation

The Hebrew G2P implementation uses the phonikud package which:

  • Handles Hebrew text with diacritics (nikud)

  • Converts Hebrew to IPA phoneme representation

  • Supports both modern and biblical Hebrew

Reference: https://github.com/thewh1teagle/phonikud