Facial identification examiners assess whether two facial images-such as an image of an unknown person from surveillance footage and a controlled image of a known individual-depict the same person or different people. To communicate their observations, they rely on predefined verbal articulation scales that sometimes have associated numeric equivalents. However, these terms have not been calibrated against the actual strength of the evidence except indirectly through proficiency tests and black box studies. The present research reanalyzes the findings of face comparisons from the most comprehensive facial identification black box study to date, as well as multiple facial examination proficiency tests, to generate a quantitative measure of the strength of the evidence for each comparison. We used an ordered probit model to summarize the distribution of responses of both individual examiners and examiner teams to produce a set of likelihood ratios for each group and test. The likelihood ratios can be lower than values implied by the evaluative statements, which do not seem to justify the strengths of evidence implied by current articulation scales used in facial comparisons. Our analyses suggest that examiners are using language that overstates the strength of the evidence by several orders of magnitude.
Aggadi et al. (Mon,) studied this question.