What type of study is this?

This is a Literature Review study.

August 25, 2025Open Access

Fine-Grained Open-Vocabulary Object Detection via Attribute Decomposition and Aggregation

Puntos clave

Our approach significantly improves detection of unseen fine-grained categories, reducing bias from textual descriptions.
We achieved notable performance improvements, particularly on specialized military and public datasets, outperforming existing methods.
The attribute decomposition and aggregation method supports better utilization of textual attributes, enhancing model discriminative capability.
This novel framework addresses challenges in fine-grained detection while leveraging multi-label learning for attribute representation.

Resumen

Open-vocabulary object detection (OVOD) aims to localize and recognize objects in images by leveraging category-specific textual inputs, including both known and novel categories. While existing methods excel in general scenarios, their performance significantly deteriorates in domain-specific fine-grained detection because of their heavy reliance on high-quality textual descriptions. In specialized domains, such textual descriptions are often affected by newly introduced terms or subjective human biases, limiting their applicability. In this paper, we propose an attribute decomposition–aggregation approach for the OVOD to address these challenges. By decomposing categories into fine-grained attributes and learning them in a multi-label manner, our method mitigates text quality issues caused by novel terms and human bias. During inference, unseen fine-grained category texts can be effectively represented by combining the decomposed attributes for detection. Even if the model learns the attributes, a key limitation of current methods is the insufficient utilization of textual attributes. To mitigate this issue, we propose an attribute-aggregation module that enhances the discriminative capability by emphasizing critical attributes for distinguishing target objects from foreground elements. To demonstrate the effectiveness of our OVOD framework, we evaluate our method on both our newly constructed military dataset and the public LAD dataset. Experimental results demonstrate that our method outperforms existing methods in domain-specific fine-grained open-vocabulary detection tasks.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Dou et al. (Mon,) studied this question.

synapsesocial.com/papers/68af6216ad7bf08b1eae38f1 https://doi.org/https://doi.org/10.3390/ai6090201

Me gusta

Guardar

Ver artículo completo