January 23, 2023Open Access

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates

Key Points

Key points are not available for this paper at this time.

Abstract

Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through k-nearest neighbor (kNN) searches. From an architectural point of view, the proposed transformer model can read and retrieve items from the external memory through cross-attention operations, and tune the flow of information coming from the external memory thanks to a novel fully attentive gate. Experimental analyses were carried out on the fashion captioning dataset (FACAD) for fashion image captioning, which contains more than 130k fine-grained descriptions, validating the effectiveness of the proposed approach and the proposed architectural strategies in comparison with carefully designed baselines and state-of-the-art approaches. The presented method constantly outperforms all compared approaches, demonstrating its effectiveness for fashion image captioning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Nicholas Moratelli

University of Modena and Reggio Emilia

Manuele Barraco

University of Modena and Reggio Emilia

Davide Morelli

University of Modena and Reggio Emilia

Journals

Sensors

Actions

Institutions

University of Modena and Reggio Emilia

Ferrari (Italy)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study