January 1, 2017Open Access

Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma

Key Points

Key points are not available for this paper at this time.

Abstract

Fine-grained image retrieval (FGIR) enables a user to search for a photo of an object instance based on a mental picture. Depending on how the object is described by the user, two general approaches exist: sketch-based FGIR or text-based FGIR, each of which has its own pros and cons. However, no attempt has been made to systematically investigate how informative each of these two input modalities is, and more importantly whether they are complementary to each thus should be modelled jointly. In this work, for the first time we introduce a multi-modal FGIR dataset with both sketches and sentences description provided as query modalities. A multi-modal quadruplet deep network is formulated to jointly model the sketch and text input modalities as well as the photo output modality. We show that on its own the sketch modality is much more informative than text and each modality can benefit the other when they are modelled jointly.

Bookmark

View Full Paper

Cite This Study

Song et al. (Sun,) studied this question.

synapsesocial.com/papers/6a11c388279ddf38dc61828e https://doi.org/https://doi.org/10.5244/c.31.45

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper