Changelog
All notable changes to kokorog2p will be documented in this file.
Unreleased
Version 0.4.0 (2026-01-11)
Added
New ``strict`` parameter for error handling control in all G2P backends
strict=True(default): Raises detailedRuntimeErrorwhen backends failstrict=False(lenient mode): Logs errors and returns empty strings (backward compatible)Available in:
get_g2p(),EspeakG2P,GoruutG2P,EnglishG2P, and baseG2PclassIncludes in cache key to ensure correct behavior for different strict modes
Early backend validation in
EspeakG2P._validate_backend()to catch initialization errors immediatelyComprehensive logging throughout error handling paths
15 new tests in
tests/test_ci_bug_fix.pycovering strict/lenient modes and error scenariosDetailed error handling documentation in README.md and docs/advanced.rst
CI/CD best practices guide for proper espeak-ng installation
Changed
BREAKING CHANGE: Silent exception handling removed - errors now raise by default
Previous behavior: Exceptions were caught and empty strings returned silently
New behavior: Exceptions raise
RuntimeErrorwith detailed context (usestrict=Falsefor old behavior)Affected methods:
phonemize(),__call__(),lookup()in all backends
Improved error messages with actionable debugging information
Enhanced fallback logging in
EspeakFallbackandGoruutFallbackclassesUpdated documentation with comprehensive error handling examples
Fixed
Critical bug fix: Fixed silent failures in CI environments that returned empty strings instead of raising errors
Root cause: 8 locations with bare
except Exceptionblocks that silently returned empty stringsImpact: Tests passed in CI even when backends failed completely
Solution: Proper error propagation with detailed error messages in strict mode
Files fixed:
espeak_g2p.py(4 locations),goruut_g2p.py(3 locations),en/fallback.py(2 locations)
Fixed missing error context in backend initialization failures
Improved error handling for voice not found scenarios in espeak
Enhanced subprocess error reporting in espeak backend
Migration Guide
For users upgrading from v0.3.x or earlier:
If your code relied on silent failures (empty strings on errors), you have two options:
Recommended: Fix the underlying issues causing errors (e.g., install espeak-ng properly)
Quick fix: Use
strict=Falseto maintain backward compatibility:# Old behavior (silent failures) g2p = get_g2p("en-us", backend="espeak", strict=False)
For CI/CD environments:
Ensure espeak-ng is properly installed before running tests
Use strict mode (default) to catch configuration issues early
See docs/advanced.rst for CI best practices
Previous Unreleased Changes
Added
German G2P module with 738k+ entry dictionary
Czech G2P module with rule-based phonology
French G2P module with gold dictionary
Comprehensive test suite (469 tests including 37 new contraction tests)
Benchmarking framework for performance testing
Contraction merging for spaCy tokenizer in English G2P
Test coverage for single and double contractions (don’t, could’ve, I’d’ve, etc.)
Changed
Improved English contraction handling with intelligent token merging
Enhanced number conversion for all languages
Better error handling for missing dependencies
Updated documentation with multi-language support examples
Improved type annotations and mypy configuration
Fixed
Fixed contraction tokenization in English (don’t was incorrectly split as “Do” + “n’t”)
Fixed Chinese tone_sandhi import type annotation
Fixed GToken __post_init__ to handle None values for extension dict
Fixed stress marker handling in German
Improved phonological rules for Czech
Fixed documentation API references for English and French modules
Version 0.1.0 (Initial Release)
Added
Core G2P framework
English G2P (US and GB variants)
Chinese G2P with jieba and pypinyin
Japanese G2P with pyopenjtalk
espeak-ng backend support
goruut backend support (experimental)
Number and currency handling
Phoneme vocabulary encoding/decoding
Punctuation normalization
Word mismatch detection
Comprehensive API documentation
Test suite with 300+ tests
Features
Dictionary-based lookup with gold/silver tiers
POS-aware pronunciation for English
Automatic stress assignment
Multi-backend support
Caching for performance
Type hints throughout
Full IPA support