What question did this study set out to answer?

This research aims to enhance error correction in Japanese automatic speech recognition by incorporating pitch accent information into a large language model.

May 14, 2026

Error correction for Japanese automatic speech recognition by large language model incorporating pitch accent information

Key Points

This research aims to enhance error correction in Japanese automatic speech recognition by incorporating pitch accent information into a large language model.
Investigated use of pre-trained large language models with pitch accent information for error correction.
Combined N-best hypotheses and corresponding pitch accent data generated by ASR for LLM fine-tuning.
Designed specific input prompts for LLM using data from Whisper.
Incorporating pitch accent information significantly reduced over-correction errors compared to traditional methods.
The fine-tuned large language model showed improved accuracy in distinguishing similar-sounding Japanese words.
Evaluation metrics indicated a marked improvement in error correction performance.

Abstract

Recent studies have shown the effectiveness of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, existing LLM based error correction approaches use only textual information and neglect pitch accent information, which leads to over-correction. In Japanese language, there are the words, such as “Hashi (Chopsticks)” and “Hashi (Bridge),” that can be distinguished by the difference of the pitch accent and the pitch accent information is important for error correction in Japanese ASR. In this paper, we investigate the use of pre-trained LLM to improve the outputs of Japanese ASR. In particular, we aim to improve error correction by using the N-best hypotheses and pitch accent information generated by ASR as input to LLM. We fine-tune the LLM and design an input prompt to the LLM by combining the N-best hypotheses and the corresponding pitch accent information generated by Whisper. Through this evaluation, we aim to clarify the effect of using pitch accent information on ASR error correction in Japanese.

Ask AI

Mark Helpful

Bookmark

Relay