What question did this study set out to answer?

The aim is to improve vehicle detection in low-resolution traffic videos by integrating video super-resolution and object detection.

March 12, 2026Open Access

A Cascaded Framework for Vehicle Detection in Low-Resolution Traffic Surveillance Videos

Key Points

The aim is to improve vehicle detection in low-resolution traffic videos by integrating video super-resolution and object detection.
Developed a cascaded framework using video super-resolution (VSR) and object detection.
Employed a transformer-based VSR model with masked intra- and inter-frame attention.
Constructed a domain-specific super-resolved dataset to train a lightweight YOLOF detector.
Conducted experiments on public datasets including REDS, Vimeo90k, and UA-DETRAC.
Achieved a mAP@0.5 of 56.89 on low-resolution UA-DETRAC, outperforming direct inference (39.17 mAP@0.5).
Improved performance by 17.72 points over direct low-resolution inference and 11.19 points over conventional fine-tuning.

Abstract

Traffic surveillance cameras, as core sensing devices in smart cities, are crucial for traffic management, violation detection, and autonomous driving. However, due to deployment constraints and hardware limitations, the videos they capture often suffer from low resolution and noise, leading to missed and false detections in traditional object detection algorithms trained on high-resolution data. To address this issue, this study proposes a cascaded collaborative framework that integrates video super-resolution (VSR) and object detection for robust perception in low-quality traffic surveillance scenarios. First, a transformer-based VSR model with masked intra- and inter-frame attention (MIA-VSR) is employed to reconstruct temporally coherent high-resolution video sequences from degraded inputs. A domain-specific super-resolved dataset is subsequently constructed to train a lightweight one-stage detector (You Only Look One-level Feature, YOLOF) for efficient vehicle localisation. Extensive experiments on public datasets (REDS, Vimeo90k, UA-DETRAC) demonstrate that the proposed framework achieved a 56.89 mAP@0.5 on low-resolution UA-DETRAC, outperforming both direct low-resolution inference (39.17 mAP@0.5) and conventional fine-tuning strategies (45.70 mAP@0.5) by 17.72 and 11.19 points, respectively. These findings indicate that super-resolution-driven data reconstruction provides an effective pathway for mitigating feature degradation in low-quality surveillance environments, offering both theoretical insight and practical value for intelligent transportation perception systems.

Bookmark

View Full Paper

Bookmark

View Full Paper

A Cascaded Framework for Vehicle Detection in Low-Resolution Traffic Surveillance Videos

Key Points

Abstract

Cite This Study