What question did this study set out to answer?

The aim is to enhance oriented object detection performance, particularly in complex scenes with small and elongated objects.

February 14, 2026

TG-DANet: Text-Guided Dual-Awareness Network for Oriented Object Detection

Key Points

The aim is to enhance oriented object detection performance, particularly in complex scenes with small and elongated objects.
Developed Text-Guided Dual-Awareness Network (TG-DANet) for oriented object detection.
Designed a Bi-Directional Feature Interaction Module (BDFIM) for capturing contextual features.
Implemented a Text-Semantic Guided Framework (TSGF) to integrate textual and visual features.
Evaluated the approach on three benchmark datasets: DOTA, DIOR-R, and HRSC2016.
Achieved 3.05% improvement in mAP on DOTA dataset.
Achieved 3.49% improvement in mAP on DIOR-R dataset.
Achieved 2.32% improvement in mAP on HRSC2016 dataset.

Abstract

Oriented object detection (OOD) has rapidly advanced in recent years. However, the performance of existing methods is unsatisfactory when dealing with challenging scenarios, especially in scenes involving small-scale objects or objects with extreme aspect ratio. Inspired by recent advances in vision-language pre-training, we propose a novel Text-Guided Dual-Awareness Network (TG-DANet), which addresses these challenges from two complementary perspectives: robust feature interaction for multi-scale and longrange context modeling, and semantic-aware feature learning through textual guidance. Specifically, we design a Bi-Directional Feature Interaction Module (BDFIM) to capture horizontal and vertical contextual features via spatial interactions, which improves the representation of small and elongated objects. Additionally, a Text-Semantic Guided Framework (TSGF) is supposed to align and fuse textual embeddings with visual features at multiple levels, which enhances model interpretability and discriminability for objects with ambiguous appearances or complex layouts. Extensive experiments on three benchmark datasets (DOTA, DIOR-R, and HRSC2016) show that TG-DANet achieves improvements of 3.05%, 3.49%, and 2.32% in mAP over baseline methods, respectively. These results demonstrate the effectiveness of our dual-perspective strategy in handling complex scenes with cluttered backgrounds and multi-scale objects, which highlights the promising potential of vision-language fusion in oriented object detection.

Bookmark

TG-DANet: Text-Guided Dual-Awareness Network for Oriented Object Detection

Key Points

Abstract

Cite This Study