August 31, 2024Open Access

Does Alignment Tuning Really Break LLMs' Internal Confidence?

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration. This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods. Initial analysis showed that the relationship between alignment and calibration is not always a trade-off, but under stricter analysis conditions, we found the alignment process consistently harms calibration. This highlights the need for (1) a careful approach when measuring model confidences and calibration errors and (2) future research into algorithms that can help LLMs to achieve both instruction-following and calibration without sacrificing either.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Oh et al. (Sat,) studied this question.

synapsesocial.com/papers/68e5a2bab6db64358753cdfe https://doi.org/https://doi.org/10.48550/arxiv.2409.00352

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Me gusta

Guardar

Ver artículo completo