July 1, 2017

Combining Bottom-Up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

This paper addresses the problem of weakly supervised semantic image segmentation. Our goal is to label every pixel in a new image, given only image-level object labels associated with training images. Our problem statement differs from common semantic segmentation, where pixelwise annotations are typically assumed available in training. We specify a novel deep architecture which fuses three distinct computation processes toward semantic segmentation - namely, (i) the bottom-up computation of neural activations in a CNN for the image-level prediction of object classes; (ii) the top-down estimation of conditional likelihoods of the CNN's activations given the predicted objects, resulting in probabilistic attention maps per object class; and (iii) the lateral attention-message passing from neighboring neurons at the same CNN layer. The fusion of (i)-(iii) is realized via a conditional random field as recurrent network aimed at generating a smooth and boundary-preserving segmentation. Unlike existing work, we formulate a unified end-to-end learning of all components of our deep architecture. Evaluation on the benchmark PASCAL VOC 2012 dataset demonstrates that we outperform reasonable weakly supervised baselines and state-of-the-art approaches.

Me gusta

Guardar