What question did this study set out to answer?

To evaluate how detector architecture and dataset properties affect intracranial hemorrhage subtype localization on noncontrast head CT.

June 4, 2026Open Access

Cross-Dataset Generalization of Deep Learning-Based Detectors for Intracranial Hemorrhage Subtype Localization on Noncontrast Head CT: A Comparative Study

Key Points

To evaluate how detector architecture and dataset properties affect intracranial hemorrhage subtype localization on noncontrast head CT.
Retrospective analysis of Brain Hemorrhage Extended (BHX) and RSNA 2019+ datasets.
Assessment of six deep learning detectors, including CNN-based and Swin Transformer-based models.
Performance evaluated using mean average precision, Dice similarity coefficient, and intersection-over-union metrics.
Swin-RT-DETR showed superior performance for specific ICH subtypes during internal validation, but effectiveness varied by subtype.
External validation revealed significant performance degradation across detectors and directions, with BB-DSC reductions of 0.54–0.79 and 0.17–0.74 observed for Swin-RT-DETR.
Statistical analysis indicated fewer significant differences among models during external validation, suggesting diminished architecture-specific advantages.

Abstract

Background/Objectives: To evaluate the effect of detector architecture and dataset characteristics on intracranial hemorrhage (ICH) subtype localization on noncontrast head CT, with emphasis on bidirectional cross-dataset generalization. Methods: This retrospective study analyzed two publicly available datasets: the Brain Hemorrhage Extended (BHX) dataset and the RSNA 2019+ dataset. Models were trained and internally validated on one dataset and externally tested on the other dataset in both directions: BHX-to-RSNA+ and RSNA+-to-BHX. Six representative deep learning detectors, including CNN-based one-stage and two-stage detectors and a Swin Transformer-based RT-DETR (Swin-RT-DETR) variant, were evaluated. Localization performance was assessed using mean average precision at a bounding-box intersection-over-union threshold of 0.5 (mAP@50), bounding-box Dice similarity coefficient (BB-DSC), and bounding-box intersection-over-union (BB-IoU). Image-level and patient-level analyses were performed, with Bonferroni correction applied for statistical comparisons. Dataset characterization analyses were performed to compare subtype prevalence, bounding-box geometry, lesion burden, annotation density, and spatial distribution. Results: Under internal validation, Swin-RT-DETR achieved competitive or superior performance across several ICH subtypes, but its advantage was subtype-dependent rather than uniform. Faster R-CNN with a ResNeXt101 backbone achieved comparable IVH performance and higher IPH BB-DSC and BB-IoU, whereas Swin-RT-DETR performed better for SAH, SDH, and EDH. External validation showed substantial performance degradation across architectures, subtypes, and validation directions. Absolute BB-DSC reductions for Swin-RT-DETR ranged from approximately 0.54–0.79 in the BHX-to-RSNA+ direction and 0.17–0.74 in the RSNA+-to-BHX direction. Similar degradation patterns were observed at the patient level. Statistical comparisons showed fewer significant model-level differences under external validation, suggesting attenuation of architecture-specific advantages under domain shift. Dataset characterization analysis demonstrated differences in subtype distribution, bounding-box geometry, lesion burden, annotation density, and spatial localization patterns between BHX and RSNA+. Conclusions: ICH subtype localization performance is strongly influenced by dataset characteristics, annotation heterogeneity, and domain shift. Although Transformer-based hierarchical feature extraction showed subtype-dependent advantages under internal validation, these advantages diminished under bidirectional external validation. These findings highlight the need for dataset characterization, external validation, patient-level evaluation, and task-specific clinical benchmarks before automated ICH localization models can be considered for real-world clinical integration.

Cross-Dataset Generalization of Deep Learning-Based Detectors for Intracranial Hemorrhage Subtype Localization on Noncontrast Head CT: A Comparative Study

Key Points

Abstract

Cite This Study