This letter proposes a new Kuzushiji transcription framework that integrates optical character recognition (OCR) with read-speech automatic speech recognition (ASR) via hiragana-level fusion, without requiring additional model training. The framework uses the transcriber’s read-speech as an additional modality to guide beam-search OCR hypothesis selection for Kuzushiji transcription. Each OCR candidate is scored based on its phonetic similarity to the ASR output of the corresponding Kuzushiji read-speech at the hiragana-sequence level. Evaluation results show the effectiveness of the proposed framework in reducing the character error rate in contrast to conventional OCR-only Kuzushiji transcription.
Zhang et al. (Thu,) studied this question.