Code readability is an important aspect of software quality, as it can significantly impact maintenance efforts. LLMs have been used to evaluate code readability. However, developers use different readability criteria for varying skill levels, necessitating the personalization of LLM-based evaluations. This study proposes two methods for calibrating readability evaluations using collaborative filtering and bandit algorithms (BAs). The experimental results demonstrate the need for personalizing LLM-based evaluations. Our methods are effective for these tasks.
Hamamoto et al. (Thu,) studied this question.