March 3, 2026Open Access

Do evaluative statements in facial identification overstate the strength of the evidence?

Key Points

Examiners' language in facial identification may overstate the actual strength of evidence, misleading interpretations.
Inherent misconceptions arise from evaluating strengths based on uncalibrated verbal scales and numeric equivalents.
Utilizing an ordered probit model, the study evaluates likelihood ratios drawn from extensive proficiency tests and black box studies.
Findings highlight the critical need for recalibration of evaluative language to align with actual evidence strength.

Abstract

Facial identification examiners assess whether two facial images-such as an image of an unknown person from surveillance footage and a controlled image of a known individual-depict the same person or different people. To communicate their observations, they rely on predefined verbal articulation scales that sometimes have associated numeric equivalents. However, these terms have not been calibrated against the actual strength of the evidence except indirectly through proficiency tests and black box studies. The present research reanalyzes the findings of face comparisons from the most comprehensive facial identification black box study to date, as well as multiple facial examination proficiency tests, to generate a quantitative measure of the strength of the evidence for each comparison. We used an ordered probit model to summarize the distribution of responses of both individual examiners and examiner teams to produce a set of likelihood ratios for each group and test. The likelihood ratios can be lower than values implied by the evaluative statements, which do not seem to justify the strengths of evidence implied by current articulation scales used in facial comparisons. Our analyses suggest that examiners are using language that overstates the strength of the evidence by several orders of magnitude.

Do evaluative statements in facial identification overstate the strength of the evidence?

Key Points

Abstract

Cite This Study