Learning a Predictable and Generative Vector Representation for Objects

What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel

PDF / 4,123,860 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
6 Downloads / 301 Views

DOWNLOAD

REPORT

Robotics Institute, Carnegie Mellon University, Pittsburgh, USA {rgirdhar,dfouhey,abhinavg}@cs.cmu.edu 2 MITRE Corporation, McLean, USA [email protected]

Abstract. What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network, to learn an embedding space with these properties. The network consists of two components: (a) an autoencoder that ensures the representation is generative; and (b) a convolutional network that ensures the representation is predictable. This enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval. Extensive experimental analysis demonstrates the usefulness and versatility of this embedding.

1

Introduction

What is a good vector representation for objects? On the one hand, there has been a great deal of work on discriminative models such as ConvNets [18,32] mapping 2D pixels to semantic labels. This approach, while useful for distinguishing between classes given an image, has two major shortcomings: the learned representations do not necessarily incorporate the 3D properties of the objects and none of the approaches have shown strong generative capabilities. On the other hand, there is an alternate line of work focusing on learning to generate objects using 3D CAD models and deconvolutional networks [5,19]. In contrast to the purely discriminative paradigm, these approaches explicitly address the 3D nature of objects and have shown success in generative tasks; however, they oﬀer no guarantees that their representations can be inferred from images and accordingly have not been shown to be useful for natural image tasks. In this paper, we propose to unify these two threads of research together and propose a new vector representation (embedding) of objects (Fig. 1). We believe that an object representation must satisfy two criteria. Firstly, it must be generative in 3D: we should be able to reconstruct objects in 3D from it. Secondly, it must be predictable from 2D: we should be able to easily infer this representation from images. These criteria are often at odds with each other: modeling occluded voxels in 3D is useful for generating objects but very diﬃcult to predict from an image. Thus, optimizing for only one criterion, as in most past c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VI, LNCS 9910, pp. 484–499, 2016. DOI: 10.1007/978-3-319-46466-4 29

Learning a Predictable and Generative Vector Representation for Objects

(a)

485

(b)

Fig. 1. (a) We learn an embedding space that has generative capabilities to construct 3D structures, while being predictable from RGB images. (b) Our ﬁnal model’s 3D reconstruction results on natural and synthetic test images. (Color ﬁgure online)

work, tends not to obtain the other. In contrast, we propose a novel architecture, the TL-emb

Data Loading...

Learning a Predictable and Generative Vector Representation for Objects

Recommend Documents

Info3D: Representation Learning on 3D Objects Using Mutual Information Maximization and Contrastive Learning

Federated Generative Adversarial Learning

Generative Conversations for Creative Learning Reimagining Literacy

A Deep Learning Generative Approach for Speech-to-Scene Generation

Inference for Estimable and Predictable Functions

Inference for Estimable and Predictable Functions

Generative Learning: Which Strategies for What Age?

SPL-MLL: Selecting Predictable Landmarks for Multi-label Learning

Designing for Generative Online Learning: A Situative Program of Research

Composition Management Interfaces for a Predictable Assembly

Bilingual Data Selection Using a Continuous Vector-Space Representation

Multi-view Semantic Learning for Data Representation