Korean API

Korean G2P provides phoneme conversion using MeCab for morphological analysis and custom phonological rules based on Korean Standard Pronunciation.

Main Class

class kokorog2p.ko.KoreanG2P(language: str = 'ko', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'ko_core_news_sm', load_silver: bool = True, load_gold: bool = True, use_dict: bool = True, group_vowels: bool = False, to_syl: bool = False, version: str = '1.0', **kwargs)[source]

Bases: G2PBase

Korean G2P using MeCab and Korean phonological rules.

This class converts Korean text to phonemes using: 1. Idiom/abbreviation replacement 2. English to Hangul conversion 3. MeCab POS tagging 4. Number spelling 5. Hangul decomposition 6. Phonological rules application 7. Jamo composition

Example:
>>> g2p = KoreanG2P()
>>> tokens = g2p("안녕하세요")
__init__(language: str = 'ko', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'ko_core_news_sm', load_silver: bool = True, load_gold: bool = True, use_dict: bool = True, group_vowels: bool = False, to_syl: bool = False, version: str = '1.0', **kwargs) None[source]

Initialize the Korean G2P.

Args:

language: Language code (e.g., ‘ko’, ‘ko-kr’). use_espeak_fallback: Whether to use espeak for unknown words.

Not typically used for Korean. Defaults to False.

use_goruut_fallback: Whether to use goruut for unknown words.

Not typically used for Korean. Defaults to False.

use_spacy: Reserved for API consistency. Korean uses g2pK backend

for tokenization and phonemization.

spacy_model: Reserved for API consistency when use_spacy is enabled. load_silver: Reserved for API consistency. Korean doesn’t use

dictionary tiers. Defaults to True.

load_gold: Reserved for API consistency. Korean doesn’t use

dictionary tiers. Defaults to True.

use_dict: Whether to use MeCab dictionary for POS tagging.

Defaults to True. If False, skips MeCab annotation.

group_vowels: If True, merge similar vowels (e.g., ㅐ->ㅔ).

Defaults to False.

to_syl: If True, compose jamo back to syllables.

Defaults to False (returns decomposed jamo).

**kwargs: Additional arguments.

property g2pk

Lazy initialization of g2pK backend.

__call__(text: str) list[GToken][source]

Convert Korean text to tokens with phonemes.

Args:

text: Input Korean text to convert.

Returns:

List of GToken objects with phonemes.

lookup(word: str, tag: str | None = None) str | None[source]

Look up a Korean word and return its phonetic representation.

Args:

word: The word to look up. tag: Optional POS tag (not used in Korean G2P).

Returns:

Phoneme string or None if empty.

get_target_model() str[source]

Get the target Kokoro model variant for this G2P instance.

Returns:

Model identifier: version string (“1.1” or “1.0”).

Examples

from kokorog2p.ko import KoreanG2P

g2p = KoreanG2P(language="ko-kr")
tokens = g2p("안녕하세요!")

for token in tokens:
    print(f"{token.text} -> {token.phonemes}")

Implementation

The Korean G2P implementation is based on g2pK by kyubyong and uses:

  • MeCab for morphological analysis

  • Korean Standard Pronunciation rules

  • Jamo-to-IPA conversion for phoneme output

Reference: https://github.com/kyubyong/g2pK