Interlaboratory comparisons are essential for assessing measurement performance, especially in emerging fields such as dimensional X-ray computed tomography (CT). Previous experimental initiatives, including the "CT Audit" and the "CIA-CT Comparison" conducted over 12 years ago, provided valuable insights into CT metrology by documenting deviations between CT measurements and coordinate measuring machine (CMM) calibrations, thereby establishing their reliability based on reference calibrations. Based on data from these two foundational works, this study examines how statistical metrics can influence the interpretation of laboratory performance in CT interlaboratory comparisons when reference calibrations are not utilized. The research evaluates proficiency testing metrics, including uncertainty-normalized metrics ( E -scores) and standard deviation-based metrics ( Z -scores), to assess measurement performance, metrological compatibility, and statistical consistency. Findings indicate that inconsistent uncertainty estimates—often influenced by the absence of standardized protocols—can lead to conflicting performance classifications. While E -scores align with metrological compatibility by incorporating biases and uncertainties, they are sensitive to uncertainty estimation errors, which may obscure biases or penalize accurate results. In contrast, Z -scores are less affected by outliers or inflated uncertainties, offering greater reliability. Combining E -scores with robust Z -scores provides a balanced framework for evaluating laboratory performance, integrating sensitivity to uncertainty with robustness to variability. Chi-square-like metrics (e.g., χ c 2 , χ PDj 2 , χ APD 2 ) are assessed to address challenges in uncertainty reporting and statistical consistency. Among these, the χ APD 2 statistic, when paired with robust Z -scores, proves to be the most effective and discriminative method for assessing statistical consistency and pairwise equivalence. For comparisons against reference values with negligible uncertainty (e.g., from CMMs), χ c 2 is more appropriate. Monte Carlo simulations further enhance statistical consistency evaluation, offering deeper insights into measurement reliability. This study recommends adopting a dual-metric framework combining E -scores and robust Z -scores, standardizing uncertainty estimation protocols to ensure consistency, implementing systematic outlier detection methods such as Mandel’s coefficients or Cochran and Grubbs’ tests, and incorporating Monte Carlo simulations to improve evaluation rigor. These recommendations address challenges in uncertainty estimation and metric selection, advancing reliable assessments of laboratory performance in CT dimensional metrology. The findings lay the foundation for improving the reliability and comparability of CT dimensional data.
Villarraga-Gómez et al. (Wed,) studied this question.