Key points are not available for this paper at this time.
Abstract We present multifaceted validity evidence for machine learning models (referred to as automated video interview personality assessments (AVI‐PAs) in this research) that were trained on verbal data and interviewer ratings from low‐stakes interviews and applied to high‐stakes interviews to infer applicant personality. The predictive models used RoBERTa embeddings and binary unigrams as predictors. In Study 1 ( N = 107), AVI‐PAs more closely reflected interviewer ratings compared to applicant and reference ratings. Also, AVI‐PAs and interviewer ratings had similar relations with applicants' interview behaviors, biographical information, and hireability. In Study 2 ( N = 25), AVI‐PAs had weak‐moderate (nonsignificant) relations with subsequent supervisor ratings of job performance. Empirically, the AVI‐PAs were most similar to interviewer ratings. AVI‐PAs, interviewer ratings, self‐reports, and reference‐reports all demonstrated weak discriminant validity evidence. LASSO regression provided superior (but still weak) discriminant evidence compared to elastic net regression. Despite using natural language embeddings to operationalize verbal behavior, the AVI‐PAs (except emotional stability) exhibited large correlations with interviewee word count. We discuss the implications of these findings for pre‐employment personality assessments and effective AVI‐PA design.
Stevenor et al. (Fri,) studied this question.