What question did this study set out to answer?

This research aims to develop a novel architecture, TriORU2-Net++, to effectively remove occlusions in light-field images.

June 3, 2026Open Access

TriORU2-Net++: attention-guided three-stage U2-Net++ for light field occlusion removal

Key Points

This research aims to develop a novel architecture, TriORU2-Net++, to effectively remove occlusions in light-field images.
Introduced a three-stage architecture for occlusion removal in light-field (LF) images.
Utilized a ResASPP-AttFPN feature extractor for multiscale feature integration.
Implemented a tri-stage U2-Net++ reconstruction module with hierarchical encoder-decoder stages.
Achieved average improvements of 0.86 dB in PSNR and 0.016 in SSIM across various LF datasets.
Demonstrated superior performance compared to state-of-the-art LF occlusion removal methods.
Supported scalable pipelines for reliable visual data processing.

Abstract

We introduce TriORU²-Net++, a novel three-stage architecture designed to address the persistent challenge of occlusion removal in light-field (LF) images by leveraging adaptive attention-guided feature integration and progressive hierarchical reconstruction. Unlike existing methods that struggle to fully exploit spatial hierarchies and adaptively restore occluded regions across scales, our model incorporates a ResASPP-AttFPN feature extractor, which integrates Residual Atrous Spatial Pyramid Pooling (ResASPP) with a spatial attention-enhanced Feature Pyramid Network (AttFPN) to selectively fuse multiscale features while emphasizing salient spatial cues essential for occlusion localization. The core of our framework is a tri-stage U²-Net++ reconstruction module, which performs progressive restoration through three hierarchically connected encoder-decoder stages of decreasing depth (4-level, 3-level, and 2-level), each built on VGG-based blocks and dense skip connections to recover increasingly refined background content. To further enhance detail preservation and structural consistency, we introduce a residual feature refiner (RFR) that consolidates residual cues and sharpens the boundaries of objects. Extensive experimental evaluations demonstrate that the proposed method surpasses recent state-of-the-art (SOTA) LF occlusion removal approaches—representing the most advanced and best-performing techniques reported in the literature—in both quantitative metrics and visual reconstruction quality. Specifically, our model achieves average improvements of 0. 86 dB in PSNR and 0. 016 in SSIM across real-world (CD scene) and synthetic LF datasets, including sparse (4-Syn, 9-Syn) and dense (Single-Occ, Double-Occ) settings. This capability is particularly relevant to the Big Data paradigm, where large-scale visual datasets demand robust preprocessing to remove occlusions and ensure reliable downstream analytics. By improving LF data fidelity while remaining efficient, our model supports scalable pipelines for high-volume visual data processing.

Bookmark

View Full Paper

Cite This Study

Senussi et al. (Mon,) studied this question.

synapsesocial.com/papers/6a1fc4bbdee9eb8c0dce635a https://doi.org/https://doi.org/10.1186/s40537-026-01457-x

Bookmark

View Full Paper