Los puntos clave no están disponibles para este artículo en este momento.
Scene Graph Generation (SGG) aims to identify objects within an image and infer relationships among them, providing a comprehensive description of image content. However, current methods are heavily impacted by a severe long-tailed problem, making it challenging to adequately train fine-grained predicates and resulting in inaccurate content understanding. To address this issue, we propose a template-guided data augmentation (TGDA) strategy that effectively balances data distribution and conducts secondary training for classifiers. Initially, we employ self-driven distillation learning to transfer advanced representation capabilities across all categories, extracting unique templates for each predicate. Furthermore, we apply centroid radiating and gate filtering on these learnable templates to construct reliable instances, thus providing a rich source of supplementary data for fine-grained predicates. We conduct extensive experiments to validate the effectiveness of the proposed method, which demonstrates state-of-the-art performance on the VG dataset.
Zang et al. (Mon,) studied this question.