Key points are not available for this paper at this time.
The aim of large scale specific-object image retrieval systems is to instantaneously images that contain the query object in the image database. Current systems, for Google Goggles, concentrate on querying using a single view of an object, e. g. a a user takes with his mobile phone, in order to answer the question “what is this? ”. we consider the somewhat converse problem of finding all images of an object given the user knows what he is looking for; so the input modality is text, not an image. problem is useful in a number of settings, for example media production teams are in searching internal databases for images or video footage to accompany news and newspaper articles. a textual query (e. g. “coca cola bottle”), our approach is to first obtain multiple of the queried object using textual Google image search. These images are then to visually query the target database to discover images containing the object of. We compare a number of different methods for combining the multiple query, including discriminative learning. We show that issuing multiple queries significantly improves recall and enables the system to find quite challenging occurrences of queried object. system is evaluated quantitatively on the standard Oxford Buildings benchmark where it achieves very high retrieval performance, and also qualitatively on the 2011 known-item search dataset.
Arandjelović et al. (Sun,) studied this question.