Knowledge Transfer for Scene-Specific Motion Prediction
When given a single frame of the video, humans can not only interpret the content of the scene, but also they are able to forecast the near future. This ability is mostly driven by their rich prior knowledge about the visual world, both in terms of (i) th
- PDF / 7,419,071 Bytes
- 17 Pages / 439.37 x 666.142 pts Page_size
- 59 Downloads / 188 Views
Computer Science Department, Stanford University, Stanford, USA [email protected] 2 Department of Industrial and Information Engineering, Seconda Universit` a di Napoli, Caserta, Italy
Abstract. When given a single frame of the video, humans can not only interpret the content of the scene, but also they are able to forecast the near future. This ability is mostly driven by their rich prior knowledge about the visual world, both in terms of (i) the dynamics of moving agents, as well as (ii) the semantic of the scene. In this work we exploit the interplay between these two key elements to predict scenespecific motion patterns. First, we extract patch descriptors encoding the probability of moving to the adjacent patches, and the probability of being in that particular patch or changing behavior. Then, we introduce a Dynamic Bayesian Network which exploits this scene specific knowledge for trajectory prediction. Experimental results demonstrate that our method is able to accurately predict trajectories and transfer predictions to a novel scene characterized by similar elements.
1
Introduction
Humans glance at an image and grasp what objects and regions are present in the scene, where they are, and how they interact with each other. But they can do even more. Humans are not only able to infer what is happening at the present instant, but also predict and visualize what can happen next. This ability to forecast the near future is mostly driven by the rich prior knowledge about the visual world. Although many ingredients are involved in this process, we believe two are the main sources of prior knowledge: (i ) the static semantic of the scene and (ii ) the dynamic of agents moving in this scenario. This is supported also by experiments showing that the human brain combines motion cues with static form cues, in order to imply motion in our natural environment [18]. Computer vision has a rich literature on analysing human trajectories and scenes, but most of the previous work addresses these problems separately. Kitani et al. [16] have recently shown that by modeling the effect of the physical scene on the choice of human actions, it is possible to infer the future activity of people from visual data. Similarly, Walker et al. [37] forecast not only the possible motion in the scene but also predict visual appearances in the future. Although these works show very interesting results, they considered only a few selected c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part I, LNCS 9905, pp. 697–713, 2016. DOI: 10.1007/978-3-319-46448-0 42
698
L. Ballan et al.
Fig. 1. Given the input scene shown in the bottom, we exploit the similarity between its semantic elements and those from a collection of training scenes, to enable activity forecasting (top image). This is achieved by transferring functional properties of a navigation map that is learned from the training set. Such properties include local dynamic properties of the target, as well as typical route choices (middle).
classes suc
Data Loading...