What question did this study set out to answer?

This work aims to enhance weakly supervised object detection by introducing a collaborative fusion method.

March 28, 2026Open Access

Plug-and-Play Global and Local Collaborative Fusion for Weakly Supervised Object Detection

Key Points

This work aims to enhance weakly supervised object detection by introducing a collaborative fusion method.
Developed a global information awareness module using singular value decomposition for image reconstruction.
Proposed a local detail fusion module to improve detail learning in visual encoding.
Conducted extensive experiments to validate the effectiveness of the proposed methods.
Achieved mAP scores of 60.2% on PASCAL VOC 2007, 57.4% on VOC 2012, and 23.2% on COCO.
Surpassed baseline methods by +2.0%, +1.2%, and +0.3%, respectively.
Established new state-of-the-art performance on all benchmarks.

Abstract

• We propose a plug-and-play global and local collaborative fusion method to improve the performance of weakly supervised object detection. • We design a pixel-level global information awareness module that utilizes singular value decomposition for image reconstruction. • We propose a local detail fusion module to enable the visual encoder to learn detailed information about target objects. • We demonstrate the effectiveness and superiority of our plug-and-play method through extensive experiments. Weakly supervised object detection (WSOD) has drawn much attention due to its closeness to practical applications, and researchers have proposed the multi-instance learning (MIL) approach to handle it as a multi-class classification problem. Although these methods have yielded promising results, extraneous information in the images severely affects the model’s feature learning due to the lack of instance-level annotation. To alleviate this limitation, in this paper, a global and local collaborative fusion method is proposed for WSOD by leveraging the complementary information of the original image and its low-rank approximation. Specifically, we design a pixel-level global information awareness (GIA) module to reconstruct the input image and remove redundant noise, which are then fed into a visual encoder to extract the features from a global perspective. Moreover, to compensate for the lack of detail preservation in GIA, we further propose a local detail fusion (LDF) module that fuses image details by leveraging both reconstructed and input images. Our proposed GIA-LDF modules are architecture-agnostic and can be seamlessly embedded into any MIL-based WSOD pipeline. Extensive experiments validate the effectiveness of our plug-and-play GIA-LDF for WSOD. We achieve 60.2%, 57.4%, and 23.2% mAP on PASCAL VOC 2007, VOC 2012, and COCO, respectively, surpassing baseline methods by +2.0%, +1.2%, and +0.3%, and establishing new state-of-the-art performance across all benchmarks.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Liang et al. (Sun,) studied this question.

synapsesocial.com/papers/69c771198bbfbc51511e100d https://doi.org/https://doi.org/10.1016/j.knosys.2026.115857

Bookmark

View Full Paper