What question did this study set out to answer?

The research aims to improve learned surveillance video coding by addressing the challenges of long and short motion representation.

April 18, 2026Open Access

Exploring Learned Surveillance Video Coding with Long-Term Reference and Adaptive Long–Short Modeling

Key Points

The research aims to improve learned surveillance video coding by addressing the challenges of long and short motion representation.
Developed a long-term reference (LTR) baseline for learned video coding.
Introduced a long-short context mining module to enhance coding performance.
Proposed a long-short motion adapter to balance motion importance.
Implemented a historical motion guidance module for improved motion decoding.
Achieved a 13.89% BD-rate savings in YUV-PSNR compared to H.266/VVC.
Improved from a 1.86% BD-rate loss on the baseline coding model.
Demonstrated reduced computational resource use compared to existing methods.

Abstract

Video coding plays a critical role for efficient transmission in surveillance camera sensors. Although long-term reference (LTR) has been fully studied in traditional hand-designed video coding approaches, its potential in learned video coding is still unexplored due to the highly unequal importance between long and short motion and the excessive motion overhead, especially for dense motion representation, e.g., optical flow. In this paper, we build an LTR baseline for learned surveillance video coding and propose an adaptive long–short modeling approach to address the above problem. Specifically, we first introduce LTR and propose a long–short context mining module to the authorized end-to-end video coding exploration model (EEM) from China’s AVS as a baseline. Since the quality of LTR significantly impacts its performance and importance, it is subsequently enhanced. Then, we propose a long–short motion adapter to address the unequal importance. Finally a historical motion guidance module is introduced to aid the motion decoding. Experimental results demonstrate that the proposed approach improves from a 1.86% BD-rate loss on EEM-4.1 to 13.89% BD-rate savings in YUV-PSNR compared with the anchor H.266/VVC under a low-delay P configuration. Although the current results are not comparable to the 44.01% gains of DCVC-FM, the proposed approach consumes less computational resources and we believe that integrating the proposed LTR method with stronger baselines will further boost the performance.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Wu et al. (Thu,) studied this question.

synapsesocial.com/papers/69e31fcb40886becb653ef55 https://doi.org/https://doi.org/10.3390/s26082461

Bookmark

View Full Paper