What type of study is this?

This is a Literature Review study.

September 10, 2025Open Access

Exploring The Current State of Transformer'S Application in The Field of Target Detection

Key Points

Transformers improve average detection accuracy by effectively modeling long-distance dependencies in images.
Key models analyzed include Detection Transformer, Deformable Detection Transformer, and Shifted Window Transformer.
Current challenges in target detection involve computational efficiency and hardware dependence, with solutions suggested through lightweight designs.
Future strategies like dynamic sparsification and cross-modal alignment may enhance the performance of transformer models.

Abstract

Target detection, a core computer vision task, is widely applied in automatic driving, industrial quality inspection, etc. However, traditional convolutional neural networks (CNNs) are limited by local receptive field and difficulty in modeling global contextual relationships, which leads to the omission of small targets and occlusion misjudgement in complex scenes.Transformer,with global attention mechanism, can effectively capture image long-distance dependencies, which creatively improves the accuracy and efficiency of target detection. This paper comprehensively analyzes the evolution of key models like Detection Transformer (DETR), Deformable Detection Transformer (Deformable DETR), and Shifted Window Transformer (Swin Transformer), explores why these models significantly enhance average detection accuracy (AP) on the COCO dataset and investigates end-to-end detection, sparse attention mechanisms, and hierarchical design. This paper concludes that lightweighting and multimodal techniques have great potential in Transformer models, and future strategies such as dynamic sparsification and cross-modal alignment can further improve model performance. Despite Transformer's accuracy breakthroughs, challenges remain in computational efficiency and hardware dependence. Lightweight design and multimodal fusion offer new solutions to these challenges, promising to advance Transformers in real-time and multi-scenario detection. This paper provides a comprehensive view on Transformers' application in target detection and serves as a key reference for future research directions.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper