Abstract Objectives Deep learning models developed for the classification of radiological reports have lacked explainability. We aimed to validate and explain a pretrained classification model by applying it to the removal of confounding data from a radiological dataset. Methods Two radiologists categorised 2038 anonymised MRI head free-text radiology reports for abnormality and for small vessel disease presence. Of these reports, 80% (n = 1630) were used to fine-tune pretrained transformer models to classify scans. Five-fold cross-validation was used in model development. The models were tested on the remaining 20% of the reports (n = 408). SHapley Additive exPlanations (SHAP) were used to explain the results. Results The models exhibited excellent classification performance, with a mean Receiver Operating Characteristic (ROC) Area-Under-the-Curve (AUC) of 0.98 for abnormality classification and 0.99 for small vessel disease classification. SHAP highlighted relevant words in both cases. Conclusions This application validated the use of a pretrained transformer in detecting confounding data in research cohorts, and exhibited explainable results that allow the models’ decisions to be understood. By highlighting the specific report terms that drive each prediction, the explainable model output can be reviewed and critiqued by subject matter experts, supporting trust, error analysis, and iterative refinement of AI tools within clinical workflows. Advances in knowledge This application demonstrates the feasibility of explainable report classification, and the fine-tuned model could be used in future for automatic removal of confounding data from radiology datasets, while providing transparent, case-level justifications that support audit, governance, and clinician acceptance.
Courtman et al. (Fri,) studied this question.