Minimally invasive surgery (MIS) benefits patients significantly, and while AI integration into computer-assisted surgery (CAS) remains a research focus, the ultimate goal is clinical deployment. To address the lack of robust multi-instrument tracking under diverse and occluded surgical scenarios, we propose a unified framework that combines a hierarchical feature convolutional attention (HFCA) module with a graph neural network (GNN)-based tracker and a novel Surgery Scenario Correction (SSC) algorithm for identity consistency. The framework enhances both detection and tracking accuracy under challenging conditions. To support this, we construct MIXsurg, a multi-type surgical instrument tracking dataset developed under physician guidance. MIXsurg provides clinically diverse annotated video sequences in a unified MOT format, enabling benchmarking for real-world applications. Experimental results show that the proposed model achieves superior performance in key evaluation metrics: MOTA of 47.1%, MOTP of 54.9%, and 72 ID switches, significantly outperforming existing popular methods. Ablation studies validate the effectiveness of each component, particularly the SSC algorithm, which improves ID switches from 433 to 72. These results demonstrate the model’s robustness and effectiveness, offering a new solution for surgical instrument tracking and contributing to surgical automation.
Feng et al. (Wed,) studied this question.