May 31, 2017Open Access

Unsupervised Learning of Disentangled Representations from Video

EDEmily DentonGoogle (United States)VBVighnesh BirodkarGoogle (United States)

Key Points

Key points are not available for this paper at this time.

Abstract

We present a new model DrNET that learns disentangled image representations from video. Our approach leverages the temporal coherence of video and a novel adversarial loss to learn a representation that factorizes each frame into a stationary part and a temporally varying component. The disentangled representation can be used for a range of tasks. For example, applying a standard LSTM to the time-vary components enables prediction of future frames. We evaluate our approach on a range of synthetic and real videos, demonstrating the ability to coherently generate hundreds of steps into the future.

Perguntar à IA

Bookmark

View Full Paper

Cite This Study

Denton et al. (Wed,) studied this question.

synapsesocial.com/papers/6a100785fa36b6e053fd26e2 https://doi.org/https://doi.org/10.48550/arxiv.1705.10915

Perguntar à IA

Bookmark

View Full Paper