Multi-level Net: A Visual Saliency Prediction Model

State of the art approaches for saliency prediction are based on Fully Convolutional Networks, in which saliency maps are built using the last layer. In contrast, we here present a novel model that predicts saliency maps exploiting a non-linear combinatio

PDF / 2,518,407 Bytes
14 Pages / 439.37 x 666.142 pts Page_size
73 Downloads / 219 Views

DOWNLOAD

REPORT

stract. State of the art approaches for saliency prediction are based on Fully Convolutional Networks, in which saliency maps are built using the last layer. In contrast, we here present a novel model that predicts saliency maps exploiting a non-linear combination of features coming from diﬀerent layers of the network. We also present a new loss function to deal with the imbalance issue on saliency masks. Extensive results on three public datasets demonstrate the robustness of our solution. Our model outperforms the state of the art on SALICON, which is the largest and unconstrained dataset available, and obtains competitive results on MIT300 and CAT2000 benchmarks. Keywords: Visual saliency · Saliency prediction · Convolutional neural network · Deep learning

1

Introduction

For many applications in image and video compression, video re-targeting and object segmentation, estimating where humans look in a scene is an essential step [6,9,22]. Neuroscientists [2], and more recently computer vision researches [13], have proposed computational saliency models to predict eye ﬁxations over images. Most traditional approaches typically cope with this task by deﬁning handcrafted and multi-scale features that capture a large spectrum of stimuli: lowerlevel features (color, texture, contrast) [11] or higher-level concepts (faces, people, text, horizon) [5]. In addition, since there is a strong tendency to look more frequently around the center of the scene than around the periphery [33], some techniques incorporate hand-crafted priors into saliency maps [19,20,35,36]. Unfortunately, eye ﬁxation can depend on several aspects and this makes it diﬃcult to design properly hand-crafted features. Deep learning techniques, with their ability to automatically learn appropriate features from massive annotated data, have shown impressive results in several vision applications such as image classiﬁcation [18] and semantic segmentation [24]. First attempts to deﬁne saliency models with the usage of deep c Springer International Publishing Switzerland 2016 G. Hua and H. J´ egou (Eds.): ECCV 2016 Workshops, Part II, LNCS 9914, pp. 302–315, 2016. DOI: 10.1007/978-3-319-48881-3 21

Multi-level Net: A Visual Saliency Prediction Model

303

convolutional networks have recently been presented [20,35]. However, due to the small amount of training data in this scenario, researchers have presented networks with few layers or pretrained in other contexts. By publishing the large dataset SALICON [12], collected thanks to crowd-sourcing techniques, researches have then increased the number of convolutional layers reducing the overﬁtting risk [19,25]. In this paper we present a general deep learning framework to predict saliency maps, called ML-Net. Diﬀerently from the previous deep learning approaches, that build saliency images based on the last convolutional layer, we propose a network that is able to combine multiple features coming from diﬀerent layers of the network. The proposed solution is also able to learn its own prior from the training

Data Loading...

Multi-level Net: A Visual Saliency Prediction Model

Recommend Documents

Visual saliency model based on crowdsourcing eye tracking data and its application in visual design

Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model

A saliency-based bottom-up visual attention model for dynamic scenes analysis

Few-shot learning with saliency maps as additional visual information

A Constant Revenue Model for Net Neutrality

Runtime Performance Enhancement of a Superpixel Based Saliency Detection Model

Service-Oriented Petri Net Model

Aggregated Deep Saliency Prediction by Self-attention Network

DeepVS2.0: A Saliency-Structured Deep Learning Method for Predicting Dynamic Visual Attention

Multilevel Transactions and Object-Model Transactions

Motion Saliency

PG-Net: Pixel to Global Matching Network for Visual Tracking