RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild

PDF / 2,942,843 Bytes
17 Pages / 595.276 x 790.866 pts Page_size
85 Downloads / 169 Views

(0123456789().,-volV)(0123456789(). ,- volV)

ORIGINAL ARTICLE

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild Rafael Berral-Soler1 • Francisco J. Madrid-Cuevas1 • Rafael Mun˜oz-Salinas1 • Manuel J. Marı´n-Jime´nez1 Received: 11 May 2020 / Accepted: 4 November 2020 Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Human head pose estimation in images has applications in many fields such as human–computer interaction or video surveillance tasks. In this work, we address this problem, defined here as the estimation of both vertical (tilt/pitch) and horizontal (pan/yaw) angles, through the use of a single Convolutional Neural Network (ConvNet) model, trying to balance precision and inference speed in order to maximize its usability in real-world applications. Our model is trained over the combination of two datasets: ‘Pointing’04’ (aiming at covering a wide range of poses) and ‘Annotated Facial Landmarks in the Wild’ (in order to improve robustness of our model for its use on real-world images). Three different partitions of the combined dataset are defined and used for training, validation and testing purposes. As a result of this work, we have obtained a trained ConvNet model, coined RealHePoNet, that given a low-resolution grayscale input image, and without the need of using facial landmarks, is able to estimate with low error both tilt and pan angles (4:4 average error on the test partition). Also, given its low inference time (6 ms per head), we consider our model usable even when paired with medium-spec hardware (i.e. GTX 1060 GPU). Code available at: https://github.com/rafabs97/headpose_final Demo video at: https://www.youtube.com/watch?v=2UeuXh5DjAE. Keywords Human head pose estimation ConvNets Human–computer interaction Deep Learning Abbreviations AFLW Annotated Facial Landmarks in the Wild CNN Convolutional Neural Network Conv Convolution ConvNet Convolutional Neural Network CT Confidence Threshold FC Fully connected flops Floating point operations per second FPS Frames per second HPE Head pose estimation IoU Intersection over Union MAE Mean Absolute Error MSE Mean Squared Error SSD Single Shot Detector

& Manuel J. Marı´n-Jime´nez [email protected] 1

Department of Computing and Numerical Analysis, University of Cordoba, Cordoba, Spain

1 Introduction Given a human head detected in a picture, we can define the task of head pose estimation (HPE) as the estimation, relative to the camera, of both vertical (tilt/pitch) and horizontal (pan/yaw) angles (see Fig. 1)—a third angle (roll) can be estimated, but it falls outside the scope of this work. Human head pose estimation is useful in many situations: for instance in vehicles (detecting if the driver of a vehicle is paying attention to the road [31]), human–computer interaction (detecting where the user’s attention is being drawn [44]), social interaction understanding (detecting if people is looking at each other [28]), video surveillance systems [18, 36] or to aid various aerial cinematography tasks [3

Data Loading...

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild

Recommend Documents

SASE: RGB-Depth Database for Human Head Pose Estimation

DOPE: Distillation of Part Experts for Whole-Body 3D Pose Estimation in the Wild

Head Pose Analysis

Unselfie: Translating Selfies to Neutral-Pose Portraits in the Wild

3D Pose Estimation

Face Pose Estimation

WC2FEst-Net: Wavelet-Based Coarse-to-Fine Head Pose Estimation from a Single Image

Enhancing feature fusion for human pose estimation

A Real-Time Head Pose Estimation Using Adaptive POSIT Based on Modified Supervised Descent Method

A Robust Pose Transformational GAN for Pose Guided Person Image Synthesis

GHand: A Graph Convolution Network for 3D Hand Pose Estimation

2D Body Pose Estimation