What question did this study set out to answer?

This research aims to improve both human and machine visual quality in heavily compressed video.

June 3, 2026Open Access

A Multi-Module Fusion Framework for Restoring Human and Machine Vision Quality in Compressed Video

Key Points

This research aims to improve both human and machine visual quality in heavily compressed video.
Proposed a joint human–machine video enhancement framework with four modules: Spatio-Temporal Fusion, High-Frequency Semantic Fusion, Texture-Guided Model, and Attention Residual Quality Enhancement.
Integrated inter-frame correlations and adaptive emphasis on salient regions for better enhancement results.
Conducted experiments comparing the proposed framework's performance against existing methods.
Achieved significantly higher PSNR and SSIM metrics compared to previous approaches.
Improved object detection rates with a specified percentage increase (exact figures not stated).
Enhanced video object segmentation performance, demonstrating practical applicability for real-world applications.

Abstract

With the increasing demand for video processing in both human perception and machine vision applications, enhancing heavily compressed video has become a critical problem in practical multimedia systems. In many real-world scenarios, video data acquired by image sensors are often compressed for efficient transmission and storage, which introduces compression artifacts and degrades both visual quality and downstream task performance. This issue is especially significant in sensor-based systems such as surveillance cameras and mobile imaging devices. To address these challenges, we propose a novel joint human–machine video enhancement framework for compressed video enhancement that jointly targets human perceptual quality and machine vision performance. The framework integrates four complementary components: a Spatio-Temporal Fusion Module that leverages inter-frame correlations, a High-Frequency Semantic Fusion module for recovering structurally important details relevant to machine tasks, a Texture-Guided Model that enhances low-level visual features, and a Refined Attention Residual Quality Enhancement Module that adaptively emphasizes salient regions. By progressively combining these modules, the framework effectively restores compressed content while preserving task-relevant semantics. The experimental results demonstrate that our method consistently outperforms existing approaches, achieving higher PSNR and SSIM as well as improved object detection and video object segmentation performance. These results highlight the framework’s practical applicability for compressed video enhancement in sensor-based systems, including intelligent surveillance and autonomous imaging platforms.

A Multi-Module Fusion Framework for Restoring Human and Machine Vision Quality in Compressed Video

Key Points

Abstract

Cite This Study