What question did this study set out to answer?

This research aims to enhance the accuracy of frustration detection in Indonesian educational AI systems.

March 26, 2026Open Access

Improving Indonesian Emotion Classification for Educational AI: LoRA Fine-tuning with Appraisal-Based Frustration Composite Scoring

Key Points

This research aims to enhance the accuracy of frustration detection in Indonesian educational AI systems.
Applied Low-Rank Adaptation (LoRA) fine-tuning to an IndoBERT model
Utilized a large-scale public Indonesian emotion dataset with 73,672 entries
Implemented an Appraisal-Based Frustration Composite Scoring mechanism with optimized thresholds
Achieved a Macro F1 score of 0.84, indicating a 3.8× improvement over the baseline
Binary frustration detection reached an F1 score of 0.86 and an accuracy of 0.91
Error analysis identified confusion at the frustrated–angry boundary and noted domain mismatch as a limitation

Abstract

Emotion classification in educational AI systems requires reliable detection of domain-specificaffective states, particularly frustration, which is consistently associated with disengagement andreduced learning outcomes. Prior work introduced an IndoBERT-based classifier for Indonesianeducational chatbot interactions, but reported poor multiclass performance (accuracy 0.31, MacroF1 0.22), attributed to a small, imbalanced training corpus and the absence of efficient fine-tuningtechniques. This paper presents a technical follow-up addressing these limitations through twocontributions. First, we apply Low-Rank Adaptation (LoRA) fine-tuning to the baseline modelusing a large-scale public Indonesian emotion dataset (73,672 entries), with a theoreticallygrounded label remapping strategy informed by Lazarus's Cognitive Appraisal Theory and theFrustration-Aggression Hypothesis. Second, we propose an Appraisal-Based FrustrationComposite Scoring mechanism that aggregates multiclass emotion probabilities usingpsychologically motivated weights to produce a binary frustrated/non-frustrated classification atan empirically optimized threshold. The fine-tuned model achieves Macro F1 of 0.84 — a 3.8×improvement over baseline — with binary frustration detection reaching F1 0.86 and accuracy0.91. Error analysis reveals systematic confusion at the frustrated–angry boundary, consistent withtheoretical accounts of frustration escalation, and identifies domain mismatch as a secondarylimitation motivating future domain-specific fine-tuning. The model is publicly available athttps://huggingface.co/ZenyxS/indobert-emotion-v2-lora

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

M. Fabian Prasetyo (Tue,) studied this question.

synapsesocial.com/papers/69c4cdcdfdc3bde44891a8ae https://doi.org/https://doi.org/10.5281/zenodo.19202133

Bookmark

View Full Paper