With the increasing demand for video processing in both human perception and machine vision applications, enhancing heavily compressed video has become a critical problem in practical multimedia systems. In many real-world scenarios, video data acquired by image sensors are often compressed for efficient transmission and storage, which introduces compression artifacts and degrades both visual quality and downstream task performance. This issue is especially significant in sensor-based systems such as surveillance cameras and mobile imaging devices. To address these challenges, we propose a novel joint human–machine video enhancement framework for compressed video enhancement that jointly targets human perceptual quality and machine vision performance. The framework integrates four complementary components: a Spatio-Temporal Fusion Module that leverages inter-frame correlations, a High-Frequency Semantic Fusion module for recovering structurally important details relevant to machine tasks, a Texture-Guided Model that enhances low-level visual features, and a Refined Attention Residual Quality Enhancement Module that adaptively emphasizes salient regions. By progressively combining these modules, the framework effectively restores compressed content while preserving task-relevant semantics. The experimental results demonstrate that our method consistently outperforms existing approaches, achieving higher PSNR and SSIM as well as improved object detection and video object segmentation performance. These results highlight the framework’s practical applicability for compressed video enhancement in sensor-based systems, including intelligent surveillance and autonomous imaging platforms.
He et al. (Mon,) studied this question.