Clockwork Convnets for Video Semantic Segmentation

Recent years have seen tremendous progress in still-image segmentation; however the naïve application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video. We pro

PDF / 2,399,243 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
65 Downloads / 270 Views

DOWNLOAD

REPORT

Abstract. Recent years have seen tremendous progress in still-image segmentation; however the na¨ıve application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video. We propose a video recognition framework that relies on two key observations: (1) while pixels may change rapidly from frame to frame, the semantic content of a scene evolves more slowly, and (2) execution can be viewed as an aspect of architecture, yielding purpose-ﬁt computation schedules for networks. We deﬁne a novel family of “clockwork” convnets driven by ﬁxed or adaptive clock signals that schedule the processing of diﬀerent layers at diﬀerent update rates according to their semantic stability. We design a pipeline schedule to reduce latency for real-time recognition and a ﬁxed-rate schedule to reduce overall computation. Finally, we extend clockwork scheduling to adaptive video processing by incorporating datadriven clocks that can be tuned on unlabeled video. The accuracy and eﬃciency of clockwork convnets are evaluated on the Youtube-Objects, NYUD, and Cityscapes video datasets.

1

Introduction

Semantic segmentation is a central visual recognition task. End-to-end convolutional network approaches have made progress on the accuracy and execution time of still-image semantic segmentation, but video semantic segmentation has received less attention. Potential applications include UAV navigation, autonomous driving, archival footage recognition, and wearable computing. The computational demands of video processing are a challenge to the simple application of image methods on every frame, while the temporal continuity of video oﬀers an opportunity to reduce this computation. Fully convolutional networks (FCNs) [1–3] have been shown to obtain remarkable results, but the execution time of repeated per-frame processing limits application to video. Adapting these networks to make use of the temporal E. Shelhamer et al.—Authors contributed equally. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-49409-8 69) contains supplementary material, which is available to authorized users. c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part III, LNCS 9915, pp. 852–868, 2016. DOI: 10.1007/978-3-319-49409-8 69

Clockwork Convnets for Video Semantic Segmentation

853

Fig. 1. Our adaptive clockwork method illustrated with the famous The Horse in Motion [9], captured by Eadweard Muybridge in 1878 at the Palo Alto racetrack. The clock controls network execution: past the ﬁrst stage, computation is scheduled only at the time points indicated by the clock symbol. During static scenes cached representations persist, while during dynamic scenes new computations are scheduled and output is combined with cached representations.

continuity of video reduces inference computation while suﬀering minimal loss in recognition accuracy. The temporal rate of change of features, or featu

Data Loading...

Clockwork Convnets for Video Semantic Segmentation

Recommend Documents

Efficient Semantic Video Segmentation with Per-Frame Inference

Video Segmentation

Semantic Analysis of Video

Deep FusionNet for Point Cloud Semantic Segmentation

Tensor Low-Rank Reconstruction for Semantic Segmentation

Boundary Enhanced Network for Improved Semantic Segmentation

Semantic Segmentation Datasets for Resource Constrained Training

Adaptive Feature Enhancement Network for Semantic Segmentation

EfficientFCN: Holistically-Guided Decoding for Semantic Segmentation

Improved Image Boundaries for Better Video Segmentation

Kernelized Memory Network for Video Object Segmentation

Semantic Segmentation with Peripheral Vision