Point-Wise Mutual Information-Based Video Segmentation with High Temporal Consistency

In this paper, we tackle the problem of temporally consistent boundary detection and hierarchical segmentation in videos. While finding the best high-level reasoning of region assignments in videos is the focus of much recent research, temporal consistenc

PDF / 4,961,116 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
102 Downloads / 265 Views

DOWNLOAD

REPORT

Abstract. In this paper, we tackle the problem of temporally consistent boundary detection and hierarchical segmentation in videos. While ﬁnding the best high-level reasoning of region assignments in videos is the focus of much recent research, temporal consistency in boundary detection has so far only rarely been tackled. We argue that temporally consistent boundaries are a key component to temporally consistent region assignment. The proposed method is based on the point-wise mutual information (PMI) of spatio-temporal voxels. Temporal consistency is established by an evaluation of PMI-based point aﬃnities in the spectral domain over space and time. Thus, the proposed method is independent of any optical ﬂow computation or previously learned motion models. The proposed low-level video segmentation method outperforms the learning-based state of the art in terms of standard region metrics.

1

Introduction

Accurate video segmentation is an important step in many high-level computer vision tasks. It can provide for example window proposals for object detection [12,27] or action tubes for action recognition [13,14]. One of the key challenges in video segmentation is on handling the large amount of data. Traditionally, methods either build upon some ﬁne-grained image segmentation [2] or supervoxel [33] method [9,10,21,22,35] or they consist in the grouping of priorly computed point trajectories (e.g. [19,24]) and transform them in a postprocessing step into dense segmentations [25]. The latter is well suited for motion segmentation applications, but has general issues with segmenting non-moving, or only slightly moving, objects. Indeed, image segmentation into small segments forms the basis for many high-level video segmentation methods like [9,22,35]. A key question when employing such preprocessing is the error it introduces. While state-of-the-art image segmentation methods [2,3,18] oﬀer highly precise boundary localization, they usually suﬀer from low temporal consistency, i.e.,the superpixel shapes and sizes can change drastically from one frame to the next. This causes undesired ﬂickering eﬀects in high-level segmentation methods. In this paper, we present a low-level video segmentation method that aims at producing spatio-temporal superpixels with high temporal consistency in a c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part III, LNCS 9915, pp. 789–803, 2016. DOI: 10.1007/978-3-319-49409-8 65

790

M. Keuper and T. Brox

Fig. 1. Results of the proposed hierarchical video segmentation method for frame 4, 14 and 24 of the ballet sequence from VSB100 [10]. The segmentation is displayed in a hot color map. Note that corresponding contours have exactly the same value. Segmentations at diﬀerent thresholds in this contour map are segmentations of the spatio-temporal volume.

bottom-up way (Fig. 1). To this aim, we employ an aﬃnity measure, that has recently been proposed for image segmentation [18]. While other, learning-based methods such as [3] slightl

Data Loading...

Point-Wise Mutual Information-Based Video Segmentation with High Temporal Consistency

Recommend Documents

Compressed Video Spatio-Temporal Segmentation

Temporal Segmentation of MPEG Video Streams

Aggregating Spatio-temporal Context for Video Object Segmentation

Video Segmentation

Bottom-Up Temporal Action Localization with Mutual Regularization

Spatio-Temporal Consistency and Negative Label Transfer for 3D Freehand US Segmentation

Video Segmentation and Its Applications

An error-based video quality assessment method with temporal information

Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

Efficient Semantic Video Segmentation with Per-Frame Inference

Video Object Segmentation with Episodic Graph Memory Networks

Human segmentation in surveillance video with deep learning