What question did this study set out to answer?

This article aims to summarize methods for deep video inpainting and detection, highlighting their categories and performance.

March 26, 2026Open Access

Deep video inpainting and video inpainting detection: A comprehensive survey from deep learning perspective

Key Points

This article aims to summarize methods for deep video inpainting and detection, highlighting their categories and performance.
Classified existing deep video inpainting methods into categories: 3D convolution-based, optical flow-based, alignment-based, temporal shift-based, attention-based, and diffusion-based.
Sorted detection methods into spatial-domain, temporal-domain, frequency-domain, and hybrid-domain networks.
Reviewed training objectives, loss functions, and benchmark datasets for the categorized methods.
Conducted qualitative and quantitative evaluations of the methods using video-level and pixel-level metrics.
Identified advantages and disadvantages of various deep video inpainting techniques.
Outlined training approaches and common datasets used in deep video inpainting and detection.
Provided a comprehensive overview of existing research addressing the potential security threats posed by deep video inpainting.

Abstract

With the advances of Deep learning, the field of video inpainting has also made significant progress recently, leading to the emergence of deep learning-based video inpainting, also known as Deep video inpainting. It learns the potential rules or feature distributions of the video dataset in a data-driven manner to complete the missing areas in the video from a spatial-temporal perspective. Its original goal is to recover damaged or lost parts of videos, but it is also used to maliciously remove target objects. As a result, the development of Deep video inpainting has brought negative effects and potential threats to the country, society, and individuals. Therefore, the detection of this issue has also attracted wide research interests in the field of information security. The primary objective of this article is to provide a comprehensive summary of Deep video inpainting and the corresponding detection methods. Specifically, we classify existing Deep video inpainting methods into different categories from the perspective of their designed deep learning module, including 3D convolution-based, optical flow-based, alignment-based, temporal shift-based, attention-based, and diffusion-based network models. Meanwhile, we also sort existing research on Deep video inpainting detection into four categories: spatial-domain, temporal-domain, frequency-domain, and hybrid-domain network models, starting from a network feature analysis perspective. In addition, we review their training objectives, loss functions, and common benchmark datasets. We present video-level and pixel-level evaluation metrics, conduct a qualitative and quantitative evaluation, and discuss the advantages and disadvantages of representative Deep video inpainting and their corresponding detection methods. Finally, potential future research directions have been outlined for Deep video inpainting and its detection methods.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper