Los puntos clave no están disponibles para este artículo en este momento.
Multispectral object detection has shown great promise in security and industrial applications. RGB images offer rich texture but are limited by lighting, whereas IR images excel in low light but lack texture. Current methods face challenges in accurately capturing information differences and achieving effective feature fusion across modalities. To address these issues, we propose a graph aggregation alignment network (GAANet) for multispectral object detection. GAANet consists of two key modules: the graph interaction fusion module (GIFM) and the information alignment module (IAM). GIFM uses graph representation learning to effectively process single-modality features, and the direct connection information flow mechanism guides and references low-level multimodal features, ensuring the global and comprehensive fusion of node information in the graph space. The results are then refined through the IAM for secondary calibration and alignment of corresponding local regions, ensuring accurate fusion. We also introduce an information reconstruction path (IRP) and reconstruction loss to prevent the loss of single-modality information due to multiple IAM calculations. GAANet achieves excellent fusion detection capability and significantly reduces the number of parameters, reducing the model size by 61.2% compared with that of representative baselines such as CALNet. GAANet achieves state-of-the-art results on the DroneVehicle, LLVIP, and FLIR datasets, with superior object detection accuracy. It also performs well on the unaligned DVTOD dataset, effectively capturing feature offsets across modalities through global graph perception.
Zheng et al. (Fri,) studied this question.