Text normalization (TN), the process of converting non-standard words into their spoken equivalents, is a fundamental pre-processing step for text-to-speech (TTS) systems. While substantial progress has been made in TN for well-resource languages, low-resource languages such as Bangla have received limited attention. We present Kingfisher, a three-stage hybrid framework combining LLM-based tokenization and semiotic class annotation, lexicon-driven context-aware verbalization, and error correction to build an accurate Bangla text normalizer. Experimental evaluations across diverse Bangla texts demonstrate that Kingfisher achieves superior performance, with an overall accuracy of 96 % (confidence interval 95% –97% ), significantly outperforming the only publicly available Bangla text normalizer, Sparrowhawk . To support further research, we release the Bangla text normalization dataset and make the source code of the text normalization system publicly available 1 1 https://github.com/Rajan-sust/BanglaTextNorm , offering a substantial contribution to the Bangla speech technology community.
Raju et al. (Fri,) studied this question.