What question did this study set out to answer?

The aim is to improve the identification of rare emotions in conversational data by addressing contextual sparsity and class imbalance.

April 3, 2026Open Access

Addressing contextual sparsity in multimodal emotion recognition using speaker-focused LLM contextualization and emotion-driven augmentation

Puntos clave

The aim is to improve the identification of rare emotions in conversational data by addressing contextual sparsity and class imbalance.
Developed a data-centric framework for emotion recognition in conversations
Performed fine-tuning of a 7B Small Language Model (LLM) for generating context-aware summaries
Implemented soft context injection to enhance utterance paraphrasing and expressive speech synthesis
Trained a multimodal autoencoder on text, summaries, and speech embeddings
The proposed method achieved an F1 score improvement of over 35% for rare emotion classes
Outperformed existing baselines without degrading overall accuracy
Demonstrated the effectiveness of generative augmentation and soft prompting in affective computing

Resumen

Multimodal Emotion Recognition in Conversations (ERC) faces certain challenges due to the contextual sparsity and class imbalance of rare emotions, which are often diluted by frequent neutral or common emotional expressions. To address this, we propose a data-centric framework that enhances the representation of underrepresented emotions via context-aware augmentation. Our approach uses a fine-tuned 7B Small Language Model to generate emotion-induced conversation summaries. These summaries are further utilized for soft context injection during augmentation, guiding the generation of utterance paraphrases and corresponding expressive speech using neural speech synthesis. This helps in augmenting the dialogue turns that belong to rare emotions rather than the entire conversation. A multimodal autoencoder-based fusion model is then trained on text, summaries, and speech embeddings that identify emotions in conversations. Experiments on benchmark datasets (MELD, EmoryNLP, and IEMOCAP) demonstrate that our method achieves significant improvements in detecting rare emotion classes ( F 1 s c o r e > 35 % ), outperforming existing baselines, and at the same time without degrading the overall accuracy. The results show the effectiveness of generative augmentation and soft prompting for building context-aware solutions in affective computing. • Proposed a data-centric framework for emotion recognition in conversations. • Addressed context sparsity using LLM-guided soft prompting and speaker cues. • Fine-tuned LLMs to generate emotion-rich, context-aware summaries of utterances. • Enhanced rare emotion classes via LLM-based paraphrasing and expressive speech. • Achieved notable F1 gains, especially for underrepresented emotion classes.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo