Los puntos clave no están disponibles para este artículo en este momento.
This study introduces an advanced methodology for enhancing emotion recognition in Kashmiri speech by leveraging optimized feature selection and integrating temporal attention mechanisms into Long Short-Term Memory (LSTM) networks. A meticulous feature selection process identified key acoustic features, including Mel Frequency Cepstral Coefficients (MFCCs), Linear Predictive Coding (LPC), and other relevant descriptors, as optimal for emotion classification. The incorporation of temporal attention layers significantly improved the model's capacity to capture complex emotional patterns and temporal dynamics within the speech data. The proposed attention-augmented LSTM model achieved an accuracy of 90.2%, outperforming the baseline LSTM model's accuracy of 86%. Notable improvements in precision, recall, and F1-scores across multiple emotional categories further highlight the efficacy of the attention mechanism in capturing subtle emotional variations. In addition to performance gains, the study provides a clear research direction by demonstrating how attention-based temporal modeling can benefit low-resource languages such as Kashmiri, where linguistic and prosodic cues differ significantly from widely studied languages. The findings therefore establish a methodological baseline that supports future SER deployments in digital domains, including chat-based systems, affect-aware agents, and other human-machine interfaces. These findings underscore the model's ability to enhance both the sensitivity and specificity of emotion recognition systems, offering a robust and efficient framework for speech-based emotion analysis. Future work will extend the proposed methodology to multilingual settings and incorporate multimodal information, enabling deeper analysis of emotional expression across diverse linguistic and cultural contexts.
Dar et al. (Wed,) studied this question.