What question did this study set out to answer?

This research aims to examine how performance metrics affect the interpretation of laboratory evaluations in dimensional X-ray computed tomography.

April 14, 2026Open Access

Comparison of performance metrics for interlaboratory evaluations in dimensional X-ray computed tomography

Key Points

This research aims to examine how performance metrics affect the interpretation of laboratory evaluations in dimensional X-ray computed tomography.
Analyzed metrics including E-scores and Z-scores for laboratory performance assessments.
Evaluated statistical consistency using chi-square-like metrics.
Implemented Monte Carlo simulations to enhance statistical reliability.
Developed recommendations for standardizing uncertainty estimation protocols and outlier detection.
Found that inconsistent uncertainty estimates can lead to conflicting performance classifications.
E-scores align with metrological compatibility but are sensitive to uncertainty errors.
Z-scores provide greater reliability against outliers and inflated uncertainties.
Combining E-scores with robust Z-scores creates a balanced evaluation framework.

Abstract

Interlaboratory comparisons are essential for assessing measurement performance, especially in emerging fields such as dimensional X-ray computed tomography (CT). Previous experimental initiatives, including the "CT Audit" and the "CIA-CT Comparison" conducted over 12 years ago, provided valuable insights into CT metrology by documenting deviations between CT measurements and coordinate measuring machine (CMM) calibrations, thereby establishing their reliability based on reference calibrations. Based on data from these two foundational works, this study examines how statistical metrics can influence the interpretation of laboratory performance in CT interlaboratory comparisons when reference calibrations are not utilized. The research evaluates proficiency testing metrics, including uncertainty-normalized metrics ( E -scores) and standard deviation-based metrics ( Z -scores), to assess measurement performance, metrological compatibility, and statistical consistency. Findings indicate that inconsistent uncertainty estimates—often influenced by the absence of standardized protocols—can lead to conflicting performance classifications. While E -scores align with metrological compatibility by incorporating biases and uncertainties, they are sensitive to uncertainty estimation errors, which may obscure biases or penalize accurate results. In contrast, Z -scores are less affected by outliers or inflated uncertainties, offering greater reliability. Combining E -scores with robust Z -scores provides a balanced framework for evaluating laboratory performance, integrating sensitivity to uncertainty with robustness to variability. Chi-square-like metrics (e.g., χ c 2 , χ PDj 2 , χ APD 2 ) are assessed to address challenges in uncertainty reporting and statistical consistency. Among these, the χ APD 2 statistic, when paired with robust Z -scores, proves to be the most effective and discriminative method for assessing statistical consistency and pairwise equivalence. For comparisons against reference values with negligible uncertainty (e.g., from CMMs), χ c 2 is more appropriate. Monte Carlo simulations further enhance statistical consistency evaluation, offering deeper insights into measurement reliability. This study recommends adopting a dual-metric framework combining E -scores and robust Z -scores, standardizing uncertainty estimation protocols to ensure consistency, implementing systematic outlier detection methods such as Mandel’s coefficients or Cochran and Grubbs’ tests, and incorporating Monte Carlo simulations to improve evaluation rigor. These recommendations address challenges in uncertainty estimation and metric selection, advancing reliable assessments of laboratory performance in CT dimensional metrology. The findings lay the foundation for improving the reliability and comparability of CT dimensional data.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Villarraga-Gómez et al. (Wed,) studied this question.

synapsesocial.com/papers/69ddd959e195c95cdefd6a56 https://doi.org/https://doi.org/10.1016/j.tmater.2026.100085

Bookmark

View Full Paper