What question did this study set out to answer?

This research aims to evaluate the efficacy of AI in transforming actuarial case studies into individual assessments and scoring them reliably.

May 8, 2026Open Access

Leveraging AI for competency assessments

Key Points

This research aims to evaluate the efficacy of AI in transforming actuarial case studies into individual assessments and scoring them reliably.
Analyzed 144 AI-generated competency assessments based on the Society of Actuaries’ core competencies.
Used Generalizability Theory to select optimized grading panels for reliability assessment.
Investigated iterative prompt refinement effects on assessment quality.
Achieved strong reliability metrics (G=0.719, 0.740) with optimized grading panels.
Iterative prompt refinements showed medium-sized effects on assessment quality.
Documented in-group bias in AI graders, favoring their own model family despite anonymization.

Abstract

Strong business skills—such as communication, professional judgment, and stakeholder management—have become a key differentiator for actuarial trainees entering the workplace and are correlated with future success. While case studies have historically been proven effective at developing these skills, existing resources are limited and typically structured as multi-week team projects that are difficult to scale, individualize, or align with specific competencies. To address this gap, this paper examines whether AI models can (i) efficiently transform a small set of comprehensive actuarial cases into many brief, single-competency, individual assessments; and (ii) score these assessments with adequate psychometric quality. Using 144 AI-generated assessments covering the Society of Actuaries’ eight core competencies, we achieve strong reliability (G=0.719, 0.740) with optimized three- and four-grader panels, respectively, selected through Generalizability Theory analysis. Our experiments reveal that iterative prompt refinement improves assessment quality, with later prompts outperforming initial versions and representing a medium-sized effect. However, we document critical challenges: all AI graders exhibit in-group bias, systematically favoring assessments generated by their own model family despite anonymization. Additionally, graders may engage in algorithmic gaming, producing low entropy scoring patterns with strong halo effects that bear no relationship to actual assessment quality. The exclusion of unreliable graders from a model family partially explains the apparent underperformance of assessments from that same family, illustrating how grader selection can inadvertently create bias. We propose a hybrid approach combining carefully selected AI grader panels with human moderators to address these documented biases while leveraging the efficiency gains of automated assessment.

Bookmark

View Full Paper

Bookmark

View Full Paper

Leveraging AI for competency assessments

Key Points

Abstract

Cite This Study