Tracking Persons-of-Interest via Adaptive Discriminative Features

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Low-level features used i

PDF / 7,039,685 Bytes
19 Pages / 439.37 x 666.142 pts Page_size
78 Downloads / 186 Views

DOWNLOAD

REPORT

2

Xi’an Jiaotong University, Xi’an, China University of Illinois, Urbana-Champaign, Champaign, USA [email protected] 3 Hanyang University, Seoul, South Korea 4 University of California, Merced, USA http://shunzhang.me.pn/papers/eccv2016/

Abstract. Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically diﬀerent in multiple shots due to signiﬁcant variations in scale, pose, expression, illumination, and make-up. Low-level features used in existing multitarget tracking methods are not eﬀective for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-speciﬁc face features using convolutional neural networks (CNNs). Unlike existing CNN-based approaches that are only trained on large-scale face image datasets oﬄine, we further adapt the pre-trained face CNN to speciﬁc videos using automatically discovered training samples from tracklets. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function. With the learned discriminative features, we apply the Hungarian algorithm to link tracklets within each shot and the hierarchical clustering algorithm to link tracklets across multiple shots to form ﬁnal trajectories. We extensively evaluate the proposed algorithm on a set of TV sitcoms and music videos and demonstrate signiﬁcant performance improvement over existing techniques.

1

Introduction

Multi-target tracking (MTT) aims at locating all targets of interest (e.g., faces, players, and cars), and inferring their trajectories in a video sequence over time while maintaining their identities. Multi-face tracking is one important domain of MTT that applies to numerous high-level video understanding tasks such as face recognition, content-based retrieval, surveillance, and group interaction analysis. The goal of multi-face tracking in unconstrained scenarios is to track faces in videos that are generated from multiple moving cameras with diﬀerent views or scenes as shown in Fig. 1. Examples include automatic character tracking in c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part V, LNCS 9909, pp. 415–433, 2016. DOI: 10.1007/978-3-319-46454-1 26

416

S. Zhang et al.

Fig. 1. We focus on tracking multiple faces according to their unknown identities in unconstrained videos, which consist of many shots from diﬀerent cameras. The main challenge is to address large face appearance variations from diﬀerent shots due to changes in pose, view angle, scale, makeup, illumination, camera motion and heavy occlusions.

movies, TV sitcoms, or music videos. It has attracted increased attention in recent years due to the fast growing popularity of such videos on the Internet. Unlike tracking in the constrained counterparts (e.g., a video from a single camera that is either ﬁxed or moved slowly) where the main chal

Data Loading...

Tracking Persons-of-Interest via Adaptive Discriminative Features

Recommend Documents

Asymmetric discriminative correlation filters for visual tracking

Multi-classifier Guided Discriminative Siamese Tracking Network

Discriminative Context-Aware Correlation Filter Network for Visual Tracking

Low-Rank Discriminative Adaptive Graph Preserving Subspace Learning

Discriminative Features Fusion with BERT for Social Sentiment Analysis

Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

Adaptive DFT-Based Interferometer Fringe Tracking

Discriminative Feature Selection via Multiclass Variable Memory Markov Model

Unsupervised visual domain adaptation via discriminative dictionary evolution

Robust Object Tracking via Information Theoretic Measures

Android Malware Detection via Behavior-Based Features

Batch Bayesian optimization via adaptive local search