What question did this study set out to answer?

The aim is to enhance multi-label image classification by integrating global context and semantic hierarchy.

February 2, 2026Open Access

Topic-aware transformer with hierarchical prompting learning for multi-label image classification

Puntos clave

The aim is to enhance multi-label image classification by integrating global context and semantic hierarchy.
Proposed a Topic-Aware Transformer with Hierarchical Prompting Learning (TATHPL)
Utilized prompt learning with multiple prompt tokens to capture topic relationships
Inserted prompts hierarchically into transformer blocks with self-attention mechanism
Achieved mean average precision (mAP) of 81.9% on MS-COCO
Attained 67.0% mAP on NUS-WIDE
Produced an overall F1-measure of 69.7% on Corel5k
Outperformed baseline models by 1.3%, 0.7%, and 22.7% on respective datasets

Resumen

While existing multi-label classification methods primarily focus on capturing label co-occurrence patterns, they often fail to explore the global contextual information and the semantic hierarchy inherent in multi-label datasets, especially in large-scale label spaces, leading to suboptimal feature extraction. To address this limitation, a novel Topic-Aware Transformer with Hierarchical Prompting Learning (TATHPL) is proposed, which is capable of hierarchically integrating latent topic information to improve the performance of multi-label image classification tasks. Specifically, we leverage prompt learning and introduce multiple prompt tokens to learn topic relationships among label combinations. These prompts are hierarchically inserted into specific transformer blocks, where the self-attention mechanism facilitates their influence on subsequent feature extraction. Compared to previous methods, our approach requires only a minimal increase in parameters on top of a standard Vision Transformer while maintaining high effectiveness. Our proposed TATHPL demonstrates state-of-the-art performance on three benchmark datasets. Specifically, it achieves an mean average precision (mAP) score of 81.9% and 67.0% on Microsoft Common Objects in Context (MS-COCO) dataset and NUS-WIDE, and an overall F1-measure (OF1) of 69.7% on Corel5k, outperforming the best baseline methods by 1.3% , 0.7% , and 22.7% , respectively. The results demonstrate that our method significantly outperforms the baseline models, highlighting its potential for improving multi-label classification tasks. The code is available at https://github.com/JiangPengWang-c/TATHPL .

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo