Perfect Accuracy with Human-in-the-Loop Object Detection

Modern state-of-the-art computer vision systems still perform imperfectly in many benchmark object recognition tasks. This hinders their application to real-time tasks where even a low but non-zero probability of error in analyzing every frame from a came

PDF / 4,670,510 Bytes
15 Pages / 439.37 x 666.142 pts Page_size
75 Downloads / 327 Views

DOWNLOAD

REPORT

Abstract. Modern state-of-the-art computer vision systems still perform imperfectly in many benchmark object recognition tasks. This hinders their application to real-time tasks where even a low but non-zero probability of error in analyzing every frame from a camera quickly accumulates to unacceptable performance for end users. Here we consider a visual aid to guide blind or visually-impaired persons in ﬁnding items in grocery stores using a head-mounted camera. The system uses a humanin-the-decision-loop approach to instruct the user how to turn or move when an object is detected with low conﬁdence, to improve the object’s view captured by the camera, until computer vision conﬁdence is higher than the highest mistaken conﬁdence observed during algorithm training. In experiments with 42 blindfolded participants reaching for 25 diﬀerent objects randomly arranged on shelves 15 times, our system was able to achieve 100 % accuracy, with all participants selecting the goal object in all trials. Keywords: Scene understanding · Quality of life technologies · Sensory substitution · Mobile and wearable systems · Applications for the visually impaired · Egocentric and ﬁrst-person vision · Computer vision · Object detection

1

Introduction and Background

People who are blind have more diﬃculty navigating the world than those with sight, even in places they have been before [8,23]. This is a condition that aﬀects 39 million people worldwide [32]. Much progress has been achieved in developing electronic travel aids to assist them as technology has advanced. One method is to convert images to soundscapes which some subjects can learn to interpret well enough to diﬀerentiate places, and to identify and locate some objects [27]. Others include localization in an environment using stereo cameras, accelerometers, and even wiﬁ access points [6,13]. Advances have also been made to traditional aids such as canes, by developing electronic replacements using, e.g., sonar to increase their warning range or grant the same feedback but without a physical cane [20,31], and replacing guide dogs with robots [16]. Among these devices c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part II, LNCS 9914, pp. 360–374, 2016. DOI: 10.1007/978-3-319-48881-3 25

Human-in-the-Loop Object Detection

361

many utilize computer vision to help with navigation, text reading, and object recognition. [1,18–20,29]. Many advances have been made in computer vision, yet, even state of the art algorithms have not yet been able to achieve perfect accuracy on standard datasets [7,12,28]. Our algorithm’s success is founded in the areas of dynamic thresholding and active vision [2]. Active vision is the process of changing views to better identify what is being looked at. This can be through changing the pose of the camera or choosing a region of interest with a larger ﬁeld of view and then attempting identiﬁcation within that region using a zoomed-in image [3, 5,11,30]. Dynamic thresholding is any recognitio

Data Loading...

Perfect Accuracy with Human-in-the-Loop Object Detection

Recommend Documents

Object Detection with Convolutional Neural Networks

Single shot object detection with refined feature

Salient Object Detection with Edge Recalibration

End-to-End Object Detection with Transformers

Object affordance detection with relationship-aware network

Object Detection and Recognition

Improving Accuracy and Efficiency of Object Detection Algorithms Using Multiscale Feature Aggregation Plugins

Improved SSD for Object Detection

Segmentation-Based Salient Object Detection

Mixture Models for Object Detection

Soft Anchor-Point Object Detection

Object detection with a model trained in Google Cloud AutoML