Background/Objectives: To evaluate the effect of detector architecture and dataset characteristics on intracranial hemorrhage (ICH) subtype localization on noncontrast head CT, with emphasis on bidirectional cross-dataset generalization. Methods: This retrospective study analyzed two publicly available datasets: the Brain Hemorrhage Extended (BHX) dataset and the RSNA 2019+ dataset. Models were trained and internally validated on one dataset and externally tested on the other dataset in both directions: BHX-to-RSNA+ and RSNA+-to-BHX. Six representative deep learning detectors, including CNN-based one-stage and two-stage detectors and a Swin Transformer-based RT-DETR (Swin-RT-DETR) variant, were evaluated. Localization performance was assessed using mean average precision at a bounding-box intersection-over-union threshold of 0.5 (mAP@50), bounding-box Dice similarity coefficient (BB-DSC), and bounding-box intersection-over-union (BB-IoU). Image-level and patient-level analyses were performed, with Bonferroni correction applied for statistical comparisons. Dataset characterization analyses were performed to compare subtype prevalence, bounding-box geometry, lesion burden, annotation density, and spatial distribution. Results: Under internal validation, Swin-RT-DETR achieved competitive or superior performance across several ICH subtypes, but its advantage was subtype-dependent rather than uniform. Faster R-CNN with a ResNeXt101 backbone achieved comparable IVH performance and higher IPH BB-DSC and BB-IoU, whereas Swin-RT-DETR performed better for SAH, SDH, and EDH. External validation showed substantial performance degradation across architectures, subtypes, and validation directions. Absolute BB-DSC reductions for Swin-RT-DETR ranged from approximately 0.54–0.79 in the BHX-to-RSNA+ direction and 0.17–0.74 in the RSNA+-to-BHX direction. Similar degradation patterns were observed at the patient level. Statistical comparisons showed fewer significant model-level differences under external validation, suggesting attenuation of architecture-specific advantages under domain shift. Dataset characterization analysis demonstrated differences in subtype distribution, bounding-box geometry, lesion burden, annotation density, and spatial localization patterns between BHX and RSNA+. Conclusions: ICH subtype localization performance is strongly influenced by dataset characteristics, annotation heterogeneity, and domain shift. Although Transformer-based hierarchical feature extraction showed subtype-dependent advantages under internal validation, these advantages diminished under bidirectional external validation. These findings highlight the need for dataset characterization, external validation, patient-level evaluation, and task-specific clinical benchmarks before automated ICH localization models can be considered for real-world clinical integration.
Building similarity graph...
Analyzing shared references across papers
Loading...
C C Lee
China Medical University
Hikam Muzakky
China Medical University
Cheng-En Juan
National Taiwan University
Diagnostics
National Taiwan University
National Yang Ming Chiao Tung University
National Tsing Hua University
Building similarity graph...
Analyzing shared references across papers
Loading...
Lee et al. (Tue,) studied this question.
synapsesocial.com/papers/6a2117bfd499ed480b1709ec — DOI: https://doi.org/10.3390/diagnostics16111705