A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

Published in Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017

Recommended citation: Marco Fraccaro, Simon Kamronn, Ulrich Paquet, Ole Winther. 2017. A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning. Advances in Neural Information Processing Systems 30. http://papers.nips.cc/paper/6951-a-disentangled-recognition-and-nonlinear-dynamics-model-for-unsupervised-learning.pdf

Abstract

This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework for unsupervised learning of sequential data that disentangles two latent representations: an object’s representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate high dimensional frames at each time step. The model is trained end-to-end on videos of a variety of simulated physical systems, and outperforms competing methods in generative and missing data imputation tasks.

See more at https://sites.google.com/view/kvae

Main result

As shown in the paper, the KVAE can be trained end-to-end, and is able learn a recognition and dynamics model from the videos. The model can be used to generate new sequences, as well as to do missing data imputation without the need to generate high-dimensional frames at each time step.