Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes

Humans navigate crowded spaces such as a university campus by following common sense rules based on social etiquette. In this paper, we argue that in order to enable the design of new target tracking or trajectory forecasting methods that can take full ad

  • PDF / 6,221,339 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 67 Downloads / 355 Views

DOWNLOAD

REPORT


Abstract. Humans navigate crowded spaces such as a university campus by following common sense rules based on social etiquette. In this paper, we argue that in order to enable the design of new target tracking or trajectory forecasting methods that can take full advantage of these rules, we need to have access to better data in the first place. To that end, we contribute a new large-scale dataset that collects videos of various types of targets (not just pedestrians, but also bikers, skateboarders, cars, buses, golf carts) that navigate in a real world outdoor environment such as a university campus. Moreover, we introduce a new characterization that describes the “social sensitivity” at which two targets interact. We use this characterization to define “navigation styles” and improve both forecasting models and state-of-the-art multi-target tracking–whereby the learnt forecasting models help the data association step.

Keywords: Trajectory forecasting Forces · UAV

1

·

Multi-target tracking

·

Social

Introduction

When pedestrians or bicyclists navigate their way through crowded spaces such as a university campus, a shopping mall or the sidewalks of a busy street, they follow common sense conventions based on social etiquette. For instance, they would yield the right-of-way at an intersection as a bike approaches very quickly from the side, avoid walking on flowers, and respect personal distance. By constantly observing the environment and navigating through it, humans have learnt the way other humans typically interact with the physical space as well as with the targets that populate such spaces e.g., humans, bikes, skaters, electric carts, cars, toddlers, etc. They use these learned principles to operate in very complex scenes with extraordinary proficiency. Researchers have demonstrated that it is indeed possible to model the interaction between humans and their surroundings to improve or solve numerous computer vision tasks: for instance, to make pedestrian tracking more robust and accurate [1–5], to enable the understanding of activities performed by groups of c Springer International Publishing AG 2016  B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 549–565, 2016. DOI: 10.1007/978-3-319-46484-8 33

550

A. Robicquet et al.

individuals [6–9], to enable accurate prediction of target trajectories in future instants [10–13]. Most of the time, however, these approaches operate under restrictive assumptions whereby the type and number of interactions are limited or the testing environment is often contrived or artificial.

Fig. 1. We aim to understand human social navigation in a multi-class setting where pedestrians, bicyclists, skateboarders and carts (to name a few) share the same space. To that end, we have collected a new dataset with a quadcopter flying over more than 100 different crowded campus scenes.

In this paper, we argue that in order to learn and use models that allow mimicking, for instance, the remarkable human capability to navigate in complex and crowded scenes, the research commun