Objectives: Primary bone tumors such as osteosarcoma and chondrosarcoma are rare but aggressive malignancies that require early and accurate diagnosis. Although X-ray radiography is a widely accessible imaging modality, detecting small or multi lesions remains challenging. Existing deep learning models are often trained on small, single-center datasets and lack generalizability, limiting their clinical effectiveness. Methods: We propose the YOLOv11-MTB, a novel enhancement to YOLOv11 integrating multi-scale Transformer-based attention, boundary-aware feature fusion, and receptive field augmentation to improve detection of small and multi-focal lesions. The model is trained and evaluated on two multi-center datasets, including the BTXRD dataset containing annotated radiographs with lesion types and bounding boxes. Results: YOLOv11-MTB achieves state-of-the-art performance on bone tumor detection tasks. It attains a mean average precision (mAP) of 79.6% on the BTXRD dataset, outperforming existing methods. In clinically relevant categories, the model achieves small-lesion mAP of 55.8% and multi-lesion mAP of 63.2%. Conclusions: The proposed YOLOv11-MTB framework demonstrates promising generalization and accuracy for primary bone tumor detection in radiographic images. Its performance in detecting small and multiple lesions suggests potential for clinical application.
Chen et al. (Fri,) studied this question.