Abstract Background Deep learning (DL) has demonstrated significant potential in the field of medical image quality assessment and has been applied to quality evaluation tasks for images from various imaging modalities and different body regions, significantly improving the efficiency of clinical image quality control. Purpose This study aimed to develop and evaluate the performance of an automated and interpretable image quality assessment (IQA) system, which was based on DL and linear regression techniques, for head CT scans. Methods An automated IQA system for head CT images was proposed and built up in a hierarchy framework consisting of three levels: 6 categories of quality issues, 10 quality items, and 10 image quality metrics (IQMs). The IQMs were measured by the DL models and subsequently linearly regressed to yield the corresponding scores for each category, which were combined to produce an overall quality score for each head CT scan. To the best of our knowledge, this study presented the first hierarchical framework that integrated DL-based segmentation and detection with interpretable linear regression for medical image quality assessment. The primary dataset consisting of 307 head CT scans was collected from Zhejiang Provincial People’s Hospital and randomly split into a regression set (80%, n = 246) and a validation set (20%, n = 61). In addition, a total of 262 head CT scans from three external centers were collected to evaluate the system’s generalizability. Five experienced radiologists were invited to independently and quantitatively assess the quality of the CT scans, providing the reference standard. The regression set was used to fit the IQA system based on the experts’ assessments, whereas the validation set was used for performance evaluation. Given the importance of IQMs in the proposed system, the measurements from the DL models were further compared to those of the experts with metrics such as intra-class correlation coefficient (ICC) and mean absolute deviation (MAD). General correlations between IQMs and experts’ scores were evaluated with Pearson correlation analysis. In the validation dataset, the overall performance of the IQA system was evaluated with paired t-test, mean absolute percentage of error (MAPE), and Bland-Altman analysis with a 95% confidence interval. Results On the primary dataset, the IQMs derived from the DL models demonstrated strong agreement with radiologists’ measurements (ICC: 0.87 ~ 0.99). Significant correlations were found in all the 10 quantitative metrics and radiologists’ subjective scores, as well as in the 6 quality issue scores and the overall image quality score ( p 0.05), and Bland-Altman analysis also suggested good agreement between the system and radiologists with no evidence of systematic bias. The experimental results of the three external datasets were provided in the Supplementary Material. Conclusion The proposed automated IQA system achieved performance comparable to that of expert radiologists in assessing the quality of head CT scans, offering a reliable tool for clinical quality control.
Zhang et al. (Fri,) studied this question.