January 1, 2007

Learning Structured Appearance Models from Captioned Images of Cluttered Scenes

Key Points

Key points are not available for this paper at this time.

Abstract

Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to learn both the names and appearances of the objects. Only a small number of local features within any given image are associated with a particular caption word. We describe a connected graph appearance model where vertices represent local features and edges encode spatial relationships. We use the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to guide the search for meaningful feature configurations. We demonstrate improved results on a dataset to which an unstructured object model was previously applied. We also apply the new method to a more challenging collection of captioned images from the Web, detecting and annotating objects within highly cluttered realistic scenes.

Ask AI

Helpful

Bookmark