Diabetic foot osteomyelitis (DFO) is a leading cause of lower-extremity complications in individuals with diabetes, and timely, accurate screening is critical to prevent severe outcomes such as limb amputation. Although conventional radiography remains the most accessible imaging modality, the subtle and heterogeneous appearance of DFO often results in delayed or missed detection. Despite the rich morphological information encoded in foot radiographs, current deep learning methods tend to underperform in capturing localized pathological patterns due to architectural limitations. In this work, we propose Dual Backbone with Gated Fusion and Transformer encoder (DualBack-GFT), a deep learning framework for automated detection and localization of DFO in plain radiographs. The model leverages two complementary backbones, EfficientNet-B6 and ResNet-50, fused via a gated mechanism that adaptively combines image-specific features. These fused representations are further refined using transformer encoders, which effectively model long-range dependencies. The architecture operates in two stages: binary classification followed by confidence-weighted bounding-box localization. We evaluate DualBack-GFT on a curated, expert-annotated baseline dataset of diabetic foot X-rays with both diagnostic and bounding-box labels. The model achieves an AUC of 0.9683 and an average ground truth coverage of 62.71%, outperforming established baselines. These results underscore the potential of dual-stage, attention-enhanced models for interpretable and robust DFO assessment in clinical radiographs.
Abbas et al. (Tue,) studied this question.