What does this research mean for the field?

Using unsupervised motion-based segmentation from videos as pseudo ground truth to train a convolutional network for single-frame object segmentation produces visual representations that significantly outperform existing unsupervised pretext tasks in transfer learning for object detection. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

July 1, 2017

View Full Paper

Learning Features by Watching Objects Move

DPDeepak PathakCarnegie Mellon University RGRoss GirshickAllen Institute PDPiotr DollárCalifornia Institute of Technology

Key Points

Key points are not available for this paper at this time.

Abstract

This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as pseudo ground truth to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed pretext tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.

Ask AI

Helpful

Bookmark

View Full Paper

Ask AI

Helpful

Bookmark

View Full Paper

Learning Features by Watching Objects Move

Key Points

Abstract

Cite This Study

Also Consider

Also Consider