While existing multi-label classification methods primarily focus on capturing label co-occurrence patterns, they often fail to explore the global contextual information and the semantic hierarchy inherent in multi-label datasets, especially in large-scale label spaces, leading to suboptimal feature extraction. To address this limitation, a novel Topic-Aware Transformer with Hierarchical Prompting Learning (TATHPL) is proposed, which is capable of hierarchically integrating latent topic information to improve the performance of multi-label image classification tasks. Specifically, we leverage prompt learning and introduce multiple prompt tokens to learn topic relationships among label combinations. These prompts are hierarchically inserted into specific transformer blocks, where the self-attention mechanism facilitates their influence on subsequent feature extraction. Compared to previous methods, our approach requires only a minimal increase in parameters on top of a standard Vision Transformer while maintaining high effectiveness. Our proposed TATHPL demonstrates state-of-the-art performance on three benchmark datasets. Specifically, it achieves an mean average precision (mAP) score of 81.9% and 67.0% on Microsoft Common Objects in Context (MS-COCO) dataset and NUS-WIDE, and an overall F1-measure (OF1) of 69.7% on Corel5k, outperforming the best baseline methods by 1.3% , 0.7% , and 22.7% , respectively. The results demonstrate that our method significantly outperforms the baseline models, highlighting its potential for improving multi-label classification tasks. The code is available at https://github.com/JiangPengWang-c/TATHPL .
Wang et al. (Thu,) studied this question.