Extending Long Short-Term Memory for Multi-View Structured Learning

Long Short-Term Memory (LSTM) networks have been successfully applied to a number of sequence learning problems but they lack the design flexibility to model multiple view interactions, limiting their ability to exploit multi-view relationships. In this p

PDF / 894,581 Bytes
16 Pages / 439.37 x 666.142 pts Page_size
10 Downloads / 252 Views

DOWNLOAD

REPORT

Vision and Sensing, Human-Centred Technology Research Centre, University of Canberra, Canberra, Australia [email protected], [email protected] 2 Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA {morency,tbaltrus}@cs.cmu.edu

Abstract. Long Short-Term Memory (LSTM) networks have been successfully applied to a number of sequence learning problems but they lack the design ﬂexibility to model multiple view interactions, limiting their ability to exploit multi-view relationships. In this paper, we propose a Multi-View LSTM (MV-LSTM), which explicitly models the view-speciﬁc and cross-view interactions over time or structured outputs. We evaluate the MV-LSTM model on four publicly available datasets spanning two very diﬀerent structured learning problems: multimodal behaviour recognition and image captioning. The experimental results show competitive performance on all four datasets when compared with state-of-the-art models.

Keywords: Long Short-Term Memory iour recognition · Image Caption

1

· Multi-View Learning · Behav-

Introduction

There is a need for computational approaches that can model multimodal structured and sequential data. This is important for modelling human actions, caption generation and other sequence analysis problems. The integration of multimodal or multi-view data can occur in diﬀerent stages. We use a general definition of views as “a particular way of observing a phenomena”. For example, in image captioning, views are from the image and its text caption. For child engagement level prediction from videos, the views are deﬁned by three visual descriptors: Head pose, HOG and HOF. Two ways of fusing multi-view data are early and late fusion techniques [19]. However, these techniques do not take advantage of complex view relationships that may exist in the input data. Structured multi-view learning is aimed at capturing view interactions, thereby exploiting their relationships for eﬀective learning. The key challenge to multi-view structured learning is to model both the view-speciﬁc and cross-view c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VII, LNCS 9911, pp. 338–353, 2016. DOI: 10.1007/978-3-319-46478-7 21

Extending Long Short-Term Memory for Multi-View Structured Learning

339

dynamics. The view-speciﬁc dynamics capture the interaction between hidden outputs from the same view, while cross-view captures the interactions between hidden outputs of other views. These dynamics enable learning of subtle view relationships for better representation learning. The notion of capturing viewspeciﬁc and cross-view dynamics is application speciﬁc and, hence, a need exists for ﬂexibility in the design to model such dynamics. We propose Multi-View LSTM (MV-LSTM), an extension to LSTM, designed to model both view-speciﬁc and cross-view dynamics by partitioning internal representations to mirror the multiple input views (see Fig. 1). We deﬁne a new family of activation functions (shown as M

Data Loading...

Extending Long Short-Term Memory for Multi-View Structured Learning

Recommend Documents

Multiview Machine Learning

Regularized Multiset Neighborhood Correlation Analysis for Semi-paired Multiview Learning

Logic for Learning Learning Comprehensible Theories from Structured

SMART: Structured Memory for Abstract Reasoning and Thinking

Stochastic Models for Structured Populations Scaling Limits and Long

Unregistered Multiview Mammogram Analysis with Pre-trained Deep Learning Models

Deep Learning with Long Short-Term Memory for IoT Traffic Prediction

Deep neural learning techniques with long short-term memory for gesture recognition

Heat Diffusion Long-Short Term Memory Learning for 3D Shape Analysis

Hierarchical Network Models for Memory and Learning

Contrastive Multiview Coding

Adaptive Online Learning Environment for Life-Long Learning