February 13, 2024Open Access

Enhancing Amharic Speech Recognition in Noisy Conditions through End-to-End Deep Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Speech recognition, also known as automatic speech recognition (ASR), is a technology that enables software to transcribe spoken language into text. However, existing Amharic ASR methods require multiple separate blocks, such as language, acoustic, and pronunciation models with dictionaries, which can be time-consuming and influence performance. This study proposes an approach that replaces much of the speech pipeline with a single recurrent neural network (RNN) architecture. Our proposed architecture is based on a hybrid approach that combines a convolutional neural network (CNN) with a recurrent neural network (RNN) and a connectionist temporal classification (CTC) loss function. We conducted several experiments with noisy audio data that contain 20,000 valid sentences. The model was evaluated using the word error rate (WER) metric, achieving impressive results of 7% WER on noisy data. This approach has significant implications for the field of speech recognition, as it reduces the human effort required to create dictionaries and improves the efficiency and accuracy of ASR systems, making them more practical for real-world applications.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper

Cite This Study

Ejigu et al. (Tue,) studied this question.

synapsesocial.com/papers/68e79572b6db643587706236 https://doi.org/https://doi.org/10.20944/preprints202402.0754.v1

Mark Helpful

Bookmark

Relay

View Full Paper