Reliable evaluation of emotional expression in large language model (LLM) outputs remains methodologically under-specified, particularly for long-form generation where label-only correctness provides limited evidence of affective reliability. A claim-conditioned framework is introduced for cross-model comparison under matched elicitation conditions, with TEAS (Text Emotion Adherence Score) as its core continuous metric. Defined in a shared prototype space induced by a frozen reference encoder, TEAS combines affective separability with entropy-aware uncertainty, enabling reliability assessment beyond discrete agreement within a fixed evaluator. Evaluation is conducted on a controlled synthetic corpus under a ground-truth-free, claim-conditioned protocol across four widely used LLM families (Gemini, GPT, Grok, and Mistral). In addition to overall comparative ordering, auxiliary diagnostic measures are reported to localize failure modes and support interpretation of model behavior, together with Holm-corrected pairwise comparisons, sequence-level drift analysis, and local hyperparameter sensitivity analysis. Empirical results show stable endpoint separation, aggregation-sensitive differences among close models, measurable sequence-level degradation, and stable relative orderings under tested local parameter variations. Overall, the study provides an interpretable and statistically grounded protocol for assessing emotion-expression reliability in LLM-generated text within a fixed reference space rather than as a human gold measure of emotional truth.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ahmet Remzi Özcan (Thu,) studied this question.
synapsesocial.com/papers/69c771f08bbfbc51511e21c2 — DOI: https://doi.org/10.3390/math14071110
Ahmet Remzi Özcan
Bursa Technical University
Mathematics
Bursa Technical University
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: