What question did this study set out to answer?

March 12, 2026Open Access

Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration

Puntos clave

The research aims to enhance the semantic segmentation of moving objects in dynamic backgrounds by integrating GANs and Transformers.
Developed a GAN-Transformer architecture with gated fusion strategy.
Employed conditional GAN to generate varied dynamic background samples.
Utilized a temporal attention module to incorporate motion vector fields.
Optimized model performance with KL-constrained semantic consistency loss.
Achieved an average Intersection over Union (IoU) of 85.6% in standard scenes, outperforming DeepLabv3 + by 9.2%.
Robustness index of 92.0% in low-light and high-speed scenarios, 8.5 points higher than baseline models.
Model performance stability maintained at 84.5–86.5% over 20-frame sequences, with 63% fluctuation reduction.
Adversarial training improved adaptability to lighting changes by 5.3%.

Resumen

Semantic segmentation of moving objects in dynamic backgrounds faces core challenges such as background interference and blurred target features. This study proposes an innovative architecture that integrates Generative Adversarial Network (GAN) with Transformers. The GAN module enhances adaptability to dynamic backgrounds through adversarial training, while the self-attention mechanism in the Transformer captures long-range semantic dependencies. A gated fusion strategy is designed to achieve dynamic balancing of multimodal features. The method employs a conditional GAN to generate dynamic background samples with variations in illumination and motion blur. A Transformer-based encoder-decoder structure is used to model global contextual relationships. A temporal attention module is introduced to incorporate motion vector fields, improving temporal consistency. Additionally, a KL-divergence (KL) constrained semantic consistency loss optimizes the plausibility of generated samples. Experiments are conducted on both a multi-dimensional simulated dataset and the real-world KITTI dataset. Results show that the proposed model achieves an average Intersection over Union (IoU) of 85.6% in standard dynamic scenes, outperforming DeepLabv3 + by 9.2% points. In low-light and high-speed motion scenarios, the robustness index reaches 92.0%, 8.5 points higher than baseline models. Ablation studies demonstrate that removing the Transformer leads to a 6.7% drop in mIoU, while excluding the feature fusion module reduces robustness by 4.0%, confirming the necessity of both components. Temporal analysis reveals that the model maintains a stable performance of 84.5–86.5% over 20-frame sequences, with fluctuation reduced by 63% compared to baseline. The adversarial training improves the model’s adaptability to lighting changes by 5.3%. The multi-head self-attention (MSA) mechanism reduces long-range misclassification by 6.7%. The gated fusion strategy lowers false positive rates in background-disturbed regions by 12.8%. This framework optimizes segmentation through a generator-segmenter feedback loop, effectively balancing dynamic background noise suppression and semantic fidelity. The contributions are threefold: (1) The first semantic segmentation framework to deeply integrate GANs and Transformers. (2) A theoretical model for dynamic feature gating and semantic consistency constraints. (3) A standardized evaluation system covering 10 dynamic background types and five illumination gradients. This study provides key technical support for real-time environmental perception in autonomous driving and intelligent surveillance, advancing both the theoretical and practical frontiers of dynamic scene understanding.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yiqiang Li

Chengdu Institute of Biology

ZhenBao Luo

Chengdu Institute of Biology

Tao Chen

Chengdu Institute of Biology

Journals

Scientific Reports

Actions

Institutions

Chengdu Institute of Biology

Aviation Industry Corporation of China (China)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Dynamic background motion object semantic segmentation algorithm based on generative adversarial network and transformer collaboration

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study