Detecting and tracking a fast, small ball in table tennis requires high spatial precision and temporal stability under motion blur and occlusion. This study proposes a detector–tracker pipeline tailored to this setting. On the detection side, a Dilated Reparameterization Block (DRB) expands the receptive field with multi-branch dilated kernels that are reparameterized into a single equivalent kernel at inference, while a Dilation-wise Residual (DWR) path enables progressive multi-scale fusion in the backbone. For 3D trajectory tracking, a calibrated binocular setup reconstructs ball coordinates via stereo triangulation and PnP, followed by a Siamese–Kalman Fusion (SKF-Net) in which the Siamese branch outputs correlation-based positions and the Kalman filter refines states sequentially to smooth drift and recover from occlusion. Experiments use the PaddlePing dataset (≈5000 annotated images) for detection and stereo videos captured at 1080p/60 FPS under indoor lighting for 3D tracking. Compared with YOLOv5/YOLOv8/RetinaNet baselines, the detector improves mAP to 0.91 ± 0.01 with ~ 1 ms additional latency. In tracking, SKF-Net attains 2.3 px mean reprojection error and 8.6 mm mean 3D deviation, with the fastest occlusion recovery time (14.8 ± 1.2 ms). These results indicate practical value for coaching analytics, training feedback, and broadcast analysis in sports.
Wenwen Chen (Fri,) studied this question.