Single-image based camera auto-calibration holds significant value for improving perception efficiency in traffic surveillance systems. However, existing approaches face dual challenges: scarcity of real-world datasets and poor adaptability to multi-view scenarios. This paper presents a systematic solution framework. First, we constructed a large-scale synthetic dataset containing 36 highway scenarios using the CARLA 0.9.15 simulation engine, generating approximately 336,000 virtual frames with precise calibration parameters. The dataset achieves statistical consistency with real-world scenes by incorporating diverse view distributions, complex weather conditions, and varied road geometries. Second, we developed DeepCalib, a deep calibration network that explicitly models perspective projection features through the triplet attention mechanism. This network simultaneously achieves road direction vanishing point localization and camera pose estimation using only a single image. Finally, we adopted a progressive learning paradigm: robust pre-training on synthetic data establishes universal feature representations in the first stage, followed by fine-tuning on real-world datasets in the second stage to enhance practical adaptability. Experimental results indicate that DeepCalib attains an average calibration precision of 89.6%. Compared to conventional multi-stage algorithms, our method achieves a single-frame processing speed of 10 frames per second, showing robust adaptability to dynamic calibration tasks across diverse surveillance views.
Zhang et al. (Thu,) studied this question.