March 21, 2024

Impact of Heterogeneous Spectral Features for enhanced low-resource Speech Recognition System under mismatched conditions

PBPuneet BawaThe NorthCap University VKVirender KadyanUniversity of Petroleum and Energy Studies AMArchana MantriAdvanced Numerical Research and Analysis Group

Key Points

Key points are not available for this paper at this time.

Abstract

The development of an Automatic Speech Recognition (ASR) system for children has been a significant difficulty because of the substantial inherent heterogeneity in the physical traits, articulation patterns, and mannerisms shown by each individual child. Moreover, the limited availability of substantial quantities of children's speech data may be linked to variances in vocal-tract geometries resulting from anatomical and physiological factors. The present study aims to address the aforementioned issues by conducting a study into the advancement of a voice recognition system specifically designed for children with limited resources. This study utilizes novel methods for extracting heterogeneous features from an input audio signal, which are based on raw as well as central moments. In order to mitigate the problem of limited data availability, this study utilizes different training systems that are developed using perturbation methods. Additionally, the optimization of modeling parameters is done in order to enhance the effectiveness of these models. The findings of these efforts demonstrate a significant improvement in the performance of the system. The use of a hybrid system based on a Deep Neural Network-Hidden Markov Model (DNN-HMM) on fused front end features results in a Relative Improvement of 21.36% compared to other baseline systems.

KI fragen

Bookmark

View Full Paper