The high velocity of tennis balls presents significant challenges for visual trajectory perception and impact point localization, frequently leading to erroneous line calls. Although technological integration, particularly ball-tracking systems, has improved accuracy, these solutions are hindered by high equipment costs, dependence on manual calibration, and limitations in real-time processing capabilities. Deep learning-based approaches offer potential cost reductions but suffer from inherent detection latency and higher missed detection rates for small, fast-moving objects. The proposed multimodal fusion system integrates event cameras with conventional video streams. A spatio-temporal feature alignment module effectively addresses heterogeneous data synchronization and reduces the false detection rate caused by motion blur in high-speed videos by 58%. A Lightweight Hybrid Network (LHNet) is designed. Additionally, Dynamic Sparse Convolution (DSC) is employed to enhance computational efficiency for real-time operation. A Ballistic Trajectory Predictor (BTP) is introduced by incorporating aerodynamic drag and spin coefficients within a physics-constrained model, reducing the impact point prediction error to 1.2 cm - a significant improvement over the 3.5 cm error typical of conventional methods. Experimental validation on the TrackNet dataset demonstrates that the system achieves a detection accuracy of 98.7% (mAP@0.5), with a trajectory tracking ID-switch rate of only 0.5%, representing a 2.6-fold improvement over DeepSORT.
Guoqing Chen (Sun,) studied this question.