Korean API
Korean G2P provides phoneme conversion using MeCab for morphological analysis and custom phonological rules based on Korean Standard Pronunciation.
Main Class
- class kokorog2p.ko.KoreanG2P(language: str = 'ko', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'ko_core_news_sm', load_silver: bool = True, load_gold: bool = True, use_dict: bool = True, group_vowels: bool = False, to_syl: bool = False, version: str = '1.0', **kwargs)[source]
Bases:
G2PBaseKorean G2P using MeCab and Korean phonological rules.
This class converts Korean text to phonemes using: 1. Idiom/abbreviation replacement 2. English to Hangul conversion 3. MeCab POS tagging 4. Number spelling 5. Hangul decomposition 6. Phonological rules application 7. Jamo composition
- Example:
>>> g2p = KoreanG2P() >>> tokens = g2p("안녕하세요")
- __init__(language: str = 'ko', use_espeak_fallback: bool = False, use_goruut_fallback: bool = False, use_spacy: bool = False, spacy_model: str = 'ko_core_news_sm', load_silver: bool = True, load_gold: bool = True, use_dict: bool = True, group_vowels: bool = False, to_syl: bool = False, version: str = '1.0', **kwargs) None[source]
Initialize the Korean G2P.
- Args:
language: Language code (e.g., ‘ko’, ‘ko-kr’). use_espeak_fallback: Whether to use espeak for unknown words.
Not typically used for Korean. Defaults to False.
- use_goruut_fallback: Whether to use goruut for unknown words.
Not typically used for Korean. Defaults to False.
- use_spacy: Reserved for API consistency. Korean uses g2pK backend
for tokenization and phonemization.
spacy_model: Reserved for API consistency when use_spacy is enabled. load_silver: Reserved for API consistency. Korean doesn’t use
dictionary tiers. Defaults to True.
- load_gold: Reserved for API consistency. Korean doesn’t use
dictionary tiers. Defaults to True.
- use_dict: Whether to use MeCab dictionary for POS tagging.
Defaults to True. If False, skips MeCab annotation.
- group_vowels: If True, merge similar vowels (e.g., ㅐ->ㅔ).
Defaults to False.
- to_syl: If True, compose jamo back to syllables.
Defaults to False (returns decomposed jamo).
**kwargs: Additional arguments.
- property g2pk
Lazy initialization of g2pK backend.
- __call__(text: str) list[GToken][source]
Convert Korean text to tokens with phonemes.
- Args:
text: Input Korean text to convert.
- Returns:
List of GToken objects with phonemes.
Examples
from kokorog2p.ko import KoreanG2P
g2p = KoreanG2P(language="ko-kr")
tokens = g2p("안녕하세요!")
for token in tokens:
print(f"{token.text} -> {token.phonemes}")
Implementation
The Korean G2P implementation is based on g2pK by kyubyong and uses:
MeCab for morphological analysis
Korean Standard Pronunciation rules
Jamo-to-IPA conversion for phoneme output
Reference: https://github.com/kyubyong/g2pK