Key points are not available for this paper at this time.
We present topic-regression multi-modal Latent Dirich-let Allocation (tr-mmLDA), a novel statistical topic model for the task of image and video annotation. At the heart of our new annotation model lies a novel latent variable regression approach to capture correlations between image or video features and annotation texts. Instead of sharing a set of latent topics between the 2 data modalities as in the formulation of correspondence LDA in 2, our approach introduces a regression module to correlate the 2 sets of topics, which captures more general forms of association and allows the number of topics in the 2 data modalities to be different. We demonstrate the power of tr-mmLDA on 2 standard annotation datasets: a 5000-image subset of COREL and a 2687-image LabelMe dataset. The proposed association model shows improved performance over correspondence LDA as measured by caption perplexity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Duangmanee Putthividhy
University of California, San Diego
Hagai Attias
Gatsby Charitable Foundation
Srikantan S. Nagarajan
University of California, San Francisco
University of California, San Diego
Building similarity graph...
Analyzing shared references across papers
Loading...
Putthividhy et al. (Tue,) studied this question.
synapsesocial.com/papers/6a18715540b522b8b365bbf8 — DOI: https://doi.org/10.1109/cvpr.2010.5540000