Online Adaptation for Joint Scene and Object Classification

Recent efforts in computer vision consider joint scene and object classification by exploiting mutual relationships (often termed as context) between them to achieve higher accuracy. On the other hand, there is also a lot of interest in online adaptation

PDF / 1,826,512 Bytes
17 Pages / 439.37 x 666.142 pts Page_size
21 Downloads / 282 Views

DOWNLOAD

REPORT

Abstract. Recent eﬀorts in computer vision consider joint scene and object classiﬁcation by exploiting mutual relationships (often termed as context) between them to achieve higher accuracy. On the other hand, there is also a lot of interest in online adaptation of recognition models as new data becomes available. In this paper, we address the problem of how models for joint scene and object classiﬁcation can be learned online. A major motivation for this approach is to exploit the hierarchical relationships between scenes and objects, represented as a graphical model, in an active learning framework. To select the samples on the graph, which need to be labeled by a human, we use an information theoretic approach that reduces the joint entropy of scene and object variables. This leads to a signiﬁcant reduction in the amount of manual labeling eﬀort for similar or better performance when compared with a model trained with the full dataset. This is demonstrated through rigorous experimentation on three datasets.

Keywords: Scene classiﬁcation

1

· Object detection · Active learning

Introduction

Scene classiﬁcation and object detection are two challenging problems in computer vision due to high intra-class variance, illumination changes, background clutter and occlusion. Most existing methods assume that data will be labeled and available beforehand in order to train the classiﬁcation models. It becomes infeasible and unrealistic to know all the labels beforehand with the huge corpus of visual data being generated on a daily basis. Moreover, adaptability of the models to the incoming data is crucial too for long-term performance guarantees. Currently, the big datasets (e.g. ImageNet [1], SUN [2]) are prepared with intensive human labeling, which is diﬃcult to scale up as more and more new images are generated. So, we want to pose a question, ‘Are all the samples equally important to manually label and learn a model from? ’. We address this question in the context of joint scene and object classiﬁcation. Electronic supplementary material The online version of this chapter (doi:10. 1007/978-3-319-46484-8 14) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2016 B. Leibe et al. (Eds.): ECCV 2016, Part VIII, LNCS 9912, pp. 227–243, 2016. DOI: 10.1007/978-3-319-46484-8 14

228

J.H. Bappy et al.

Fig. 1. This ﬁgure presents the motivation of incorporating relationship among scene and object samples within an image. Here, scene (S) and objects (O1 , O2 , . . . , O 6 ) are predicted by our initial classiﬁer and detectors with some uncertainty. We formulate a graph exploiting scene-object (S-O) and object-object (O-O) relationships. As shown in the ﬁgure, even though {S, O2 , O3 , O4 , O5 , O6 } nodes have high uncertainty, manually labeling only 3 of them is good enough to reduce the uncertainty of all the nodes if S-O and O-O relationships are considered. So, the manual labeling cost can be signiﬁcantly reduced by our proposed approach.

Active learning [3

Data Loading...

Online Adaptation for Joint Scene and Object Classification

Recommend Documents

Online Experiment-Driven Learning and Adaptation

A Scene Classification Approach for Augmented Reality Devices

Indoor vs. Outdoor Scene Classification for Mobile Robots

Efficient Deep Learning Approach for Multi-label Semantic Scene Classification

Domain adaptation for object recognition using subspace sampling demons

Deep Convolutional Neural Network for Remote Sensing Scene Classification

Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections

Asymmetric alignment joint consistent regularization for multi-source domain adaptation

Triple Online Boosting Training for Fast Object Detection

Joint consensus and diversity for multi-view semi-supervised classification

D-GHNAS for Joint Intent Classification and Slot Filling

Big Visual Data Analysis Scene Classification and Geometric Labeling