Scale-space multi-view bag of words for scene categorization

PDF / 2,727,048 Bytes
23 Pages / 439.37 x 666.142 pts Page_size
56 Downloads / 181 Views

Scale-space multi-view bag of words for scene categorization Davar Giveki 1 Received: 29 December 2019 / Revised: 27 July 2020 / Accepted: 28 August 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

As a widely-used method in the image categorization tasks, the Bag-of-Words (BoW) method still suffers from many limitations such as overlooking spatial information. In this paper, we propose four improvements to the BoW method to consider spatial and semantic information as well as information from multiple views. In particular, our contributions are: (a) encoding spatial information based on a combination of wavelet transform image scaling and a new image partitioning scheme, (b) proposing a spatial-information- and content-aware visual word dictionary generation approach, (c) developing a content-aware feature weighting approach to considers the significance of the features for different semantics, (d) proposing a novel weighting strategy to fuse color information when discriminative shape features are lacking. We call our method Scale-Space Multi-View Bag of Words (SSMV-BoW). We conducted extensive experiments to evaluate our SSMV-BoW and compare it to the state-of-the-art scene categorization methods. For our experiments, we use four publicly available and widely used scene categorization benchmark datasets. Results demonstrate that our SSMV-BoW outperforms the methods using both hand-crafted and deep learning features. In addition, ablation studies show that all four improvements contribute to the performance of our SSMV-BoW. Keywords Scene categorization . Bag of words . Scale-space features . Feature fusion . TF-IDF weighting

1 Introduction The exponential increase in the volume of visual data in recent years has made numerous realworld applications possible, such as content-based image retrieval and categorization [2, 27, 28, 42–45, 75]. Due to the diversity and complexity of the existing visual data, developing

* Davar Giveki [email protected]; [email protected]

1

Department of Computer Engineering, Malayer University, P. O. Box 65719-95863, Malayer, Iran

Multimedia Tools and Applications

effective machine learning (ML) algorithms is of high interest and demand. Therefore, a large number of ML algorithms have been proposed in the past two decades for handling the visual data challenges. Recent advances in the methods based on deep learning (DL), e.g., deep convolutional neural networks (CNNs) have largely improved the performance of the state-ofthe-art visual recognition algorithms in variety of computer vision tasks such as classification [12, 3, 25, 35, 60, 80], scene categorization [6, 13, 33, 72], face recognition [21], image restoration [68], pedestrian detection [62], and medical image analysis [54]. In spite of the promising results of the DL-based methods, multi-view learning (MVL) is a flourishing direction in the ML domain with strong theoretical underpinnings and great practical success [84]. MVL deals with the data by multiple distinct featu

Data Loading...

Scale-space multi-view bag of words for scene categorization

Recommend Documents

Environment Scene Classification Based on Images Using Bag-of-Words

Enriched Bag of Words for Protein Remote Homology Detection

Similar Cluster Based Continuous Bag-of-Words for Word Vector Training

SOM-Based Visualization of Potential Technical Solutions with Fuzzy Bag-of-Words Utilizing Multi-view Information

Automatic Detection of Epileptic Waves in Electroencephalograms Using Bag of Visual Words and Machine Learning

Investigating the Corpus Independence of the Bag-of-Audio-Words Approach

Automatic Rating of Perivascular Spaces in Brain MRI Using Bag of Visual Words

Suicidal Intent Prediction Using Natural Language Processing (Bag of Words) Approach

Bag of biterms modeling for short texts

Layer-based sparse representation of multiview images

Contrastive Multiview Coding

Text Categorization