HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

  • PDF / 3,776,723 Bytes
  • 31 Pages / 595.276 x 790.866 pts Page_size
  • 25 Downloads / 260 Views

DOWNLOAD

REPORT


HOTA: A Higher Order Metric for Evaluating Multi-object Tracking Jonathon Luiten1 Bastian Leibe1

· Aljo˘sa O˘sep2 · Patrick Dendorfer2 · Philip Torr3 · Andreas Geiger4,5 · Laura Leal-Taixé2 ·

Received: 2 May 2020 / Accepted: 19 August 2020 © The Author(s) 2020

Abstract Multi-object tracking (MOT) has been notoriously difficult to evaluate. Previous metrics overemphasize the importance of either detection or association. To address this, we present a novel MOT evaluation metric, higher order tracking accuracy (HOTA), which explicitly balances the effect of performing accurate detection, association and localization into a single unified metric for comparing trackers. HOTA decomposes into a family of sub-metrics which are able to evaluate each of five basic error types separately, which enables clear analysis of tracking performance. We evaluate the effectiveness of HOTA on the MOTChallenge benchmark, and show that it is able to capture important aspects of MOT performance not previously taken into account by established metrics. Furthermore, we show HOTA scores better align with human visual evaluation of tracking performance. Keywords Multi-object tracking · Evaluation metrics · Visual tracking

1 Introduction

Communicated by Daniel Scharstein.

B

Jonathon Luiten [email protected] Aljo˘sa O˘sep [email protected] Patrick Dendorfer [email protected] Philip Torr [email protected] Andreas Geiger [email protected] Laura Leal-Taixé [email protected] Bastian Leibe [email protected]

1

RWTH Aachen University, Aachen, Germany

2

Technical University Munich, Munich, Germany

3

University of Oxford, Oxford, UK

4

Max Planck Institute for Intelligent Systems, Tübingen, Tübingen, Germany

5

University of Tübingen, Tübingen, Germany

Multi-Object Tracking (MOT) is the task of detecting the presence of multiple objects in video, and associating these detections over time according to object identities. The MOT task is one of the key pillars of computer vision research, and is essential for many scene understanding tasks such as surveillance, robotics or self-driving vehicles. Unfortunately, the evaluation of MOT algorithms has proven to be very difficult. MOT is a complex task, requiring accurate detection, localisation, and association over time. This paper defines a metric, called HOTA (Higher Order Tracking Accuracy), which is able to evaluate all of these aspects of tracking. We provide extended analysis as to why HOTA is often preferable to current alternatives for evaluating MOT algorithms. As can be seen in Fig. 1, currently used metrics MOTA (Bernardin and Stiefelhagen 2008) and IDF1 (Ristani et al. 2016) overemphasize detection and association respectively. HOTA explicitly measures both types of errors and combines these in a balanced way. HOTA also incorporates measuring the localisation accuracy of tracking results which isn’t present in either MOTA or IDF1. HOTA can be used as a single unified metric for ranking trackers, while also decomposing into a family of sub-met