Generative Query Networks
Science Article: Neural Scene Representation and Rendering
DeepMind Blog Post: Neural Scene Representation and Rendering
keywords: 3D scene understanding, vision, scene representation, variational inference, generative models, ConvDRAW, approximate inference, scene uncertainty.
Scene representation—the process of converting visual sensory data into concise descriptions—is a requirement for intelligent behavior. Recent work has shown that neural networks excel at this task when provided with large, labeled datasets. However, removing the reliance on human labeling remains an important open problem. To this end, we introduce the Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them.
Unsupervised Learning of 3D Structure from Images
NIPS2016 Article: Unsupervised Learning of 3D Structure from Images
keywords: 3D scene understanding, vision, scene representation, variational inference, generative models, ConvDRAW3D, Spatial Transformer, volumetric data, mesh, OpenGL, approximate inference, scene uncertainty.
A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. We demonstrate high-quality samples and report log-likelihoods on several datasets, including ShapeNet , and establish the first benchmarks in the literature. We also show how these models and their inference networks can be trained end-to-end from 2D images. This demonstrates for the first time the feasibility of learning to infer 3D representations of the world in a purely unsupervised manner.