Hierarchical multi-label propagation using speaking face graphs for multimodal person discovery

PDF / 1,074,516 Bytes
24 Pages / 439.642 x 666.49 pts Page_size
45 Downloads / 312 Views

Hierarchical multi-label propagation using speaking face graphs for multimodal person discovery Gabriel Barbosa da Fonseca1 · Gabriel Sargent2 · Ronan Sicre2 · ˜ 1 Zenilton K. G. Patroc´ınio Jr1 · Guillaume Gravier3 · Silvio Jamil F. Guimaraes Received: 4 February 2020 / Revised: 13 July 2020 / Accepted: 21 August 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract TV archives are growing in size so fast that manually indexing becomes unfeasible. Automatic indexing techniques can be applied to overcome this issue, and this work proposes an unsupervised technique for multimodal person discovery. To achieve this goal, we propose a hierarchical label propagation technique based on quasi-flat zones theory, that learns from labeled and unlabeled data and propagates names through a multimodal graph representation. In this representation, we combine audio, video, and text processing techniques to model the data as a graph of speaking faces. In the proposed modeling, we extract names via optical character recognition and propagate them through the graph using audiovisual relationships between speaking faces. We also use a random walk label propagation and two graph clustering strategies to serve as baselines. The proposed label propagation techniques always outperform the clustering baselines on the quantitative assessments. Our approach also outperforms all literature methods tested on the same dataset except for one, which uses a different preprocessing step. The proposed hierarchical label propagation and the random walk baseline produce highly equivalent results according to the Kappa coefficient, but the hierarchical propagation is parameter-free and over 9 times faster than the random walk under the same configurations. Keywords Multimedia indexing · Multimodal fusion · Label propagation · Face recognition · Speaker recognition

1 Introduction With TV being one of the main means of communication during the past decades, the amount of content produced and stored by TV channels is extremely vast and is continuously growing in size. Although, it is irrelevant to have an extensive amount of data that is not searchable, and with that in mind many approaches for automatically indexing TV

Silvio Jamil F. Guimar˜aes

[email protected]

Extended author information available on the last page of the article.

Multimedia Tools and Applications

videos were developed. Indexes that represent the identity of people in these archives are essential when searching for content since human nature leads people to be very interested in other people. However, at the moment that content is created or broadcasted, it is not always possible to predict which people will be the most relevant in the future. For this reason, it is not possible to assume that any model capable of detecting a specific individual will be present at indexing time. This combined with the impossibility of manually labeling entire databases ends up on the creation of partially, usually minimally, annotated archives. To solve such a

Data Loading...

Hierarchical multi-label propagation using speaking face graphs for multimodal person discovery

Recommend Documents

Robust Speaking Face Identification for Video Analysis

Representing Multi-scale Datalog\(+/-\) Using Hierarchical Graphs

Contextual Propagation of Properties for Knowledge Graphs

Face Tracker-Assisted Multi-Person Face Recognition in Surveillance Videos

Gabor face clustering using affinity propagation and structural similarity index

Visual Motif Discovery via First-Person Vision

Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation

Transfer-Expanded Graphs for On-Demand Multimodal Transit Systems

HMOR: Hierarchical Multi-person Ordinal Relations for Monocular Multi-person 3D Pose Estimation

Multilabel Classification

Efficient Construction of Hierarchical Overlap Graphs

Bayesian hierarchical multi-objective optimization for vehicle parking route discovery