What question did this study set out to answer?

This research aims to establish reliable methods for assessing the quality of medical image translation through expert evaluations and automated metrics.

March 28, 2026Open Access

Visual fidelity-driven quality assessment of medical image translation

Key Points

This research aims to establish reliable methods for assessing the quality of medical image translation through expert evaluations and automated metrics.
Evaluated image-to-image translation quality across four cross-modality synthesis tasks.
Utilized a large-scale expert visual quality assessment combined with automated IQA modeling.
Applied an adversarial diffusion-based framework, SynDiff, for image synthesis tasks.
Employed ensemble regression methods and four-fold cross-validation on IQA metrics from expert ratings.
Ensemble regression models closely matched expert visual ratings, typically within ±0.5 Likert points.
Reference-based models showed higher agreement with visual ratings than no-reference models (R² 0.75 vs. 0.59).
Key predictors identified through explainability analyses included contrast-sensitive metrics.

Abstract

Automated and reliable image quality assessment (IQA) is essential for safe use of medical image synthesis in critical applications like adaptive radiotherapy, treatment planning, or missing-modality reconstruction, where unnoticed generative artifacts may adversely affect outcomes. We evaluated image-to-image translation quality by coupling large-scale expert visual quality assessment with explainable automated IQA modeling. Adversarial diffusion-based framework, SynDiff, was applied to four cross-modality synthesis tasks, including three inter-MR and a CBCT-to-CT translation. Using four-fold cross-validation, ten reference-based and eight no-reference IQA metrics were computed for all synthesized images. Visual IQA ratings were independently collected from thirteen expert raters using predetermined protocol and specialized image viewer enabling blinded, randomized six-point Likert scoring. Auto-Sklearn was employed to learn ensemble regression models mapping IQA metrics to visual consensus ratings, with separate models trained on reference-based and no-reference metrics. The models closely reproduced distribution and ordering of expert ratings, typically within ±0.5 Likert points. Reference-based models achieved higher agreement with visual ratings than no-reference models (R 2 0.75 vs. 0.59, resp.), although the latter remained unbiased and informative. Explainability analyses highlighted structure- and contrast-sensitive metrics as key predictors. Overall, the results demonstrate that ensemble regression models can provide transparent, scalable, and clinically meaningful quality control for generative medical imaging.

Bookmark

View Full Paper

Bookmark

View Full Paper

Visual fidelity-driven quality assessment of medical image translation

Key Points

Abstract

Cite This Study