Scale-space multi-view bag of words for scene categorization
- PDF / 2,727,048 Bytes
- 23 Pages / 439.37 x 666.142 pts Page_size
- 56 Downloads / 172 Views
Scale-space multi-view bag of words for scene categorization Davar Giveki 1 Received: 29 December 2019 / Revised: 27 July 2020 / Accepted: 28 August 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
As a widely-used method in the image categorization tasks, the Bag-of-Words (BoW) method still suffers from many limitations such as overlooking spatial information. In this paper, we propose four improvements to the BoW method to consider spatial and semantic information as well as information from multiple views. In particular, our contributions are: (a) encoding spatial information based on a combination of wavelet transform image scaling and a new image partitioning scheme, (b) proposing a spatial-information- and content-aware visual word dictionary generation approach, (c) developing a content-aware feature weighting approach to considers the significance of the features for different semantics, (d) proposing a novel weighting strategy to fuse color information when discriminative shape features are lacking. We call our method Scale-Space Multi-View Bag of Words (SSMV-BoW). We conducted extensive experiments to evaluate our SSMV-BoW and compare it to the state-of-the-art scene categorization methods. For our experiments, we use four publicly available and widely used scene categorization benchmark datasets. Results demonstrate that our SSMV-BoW outperforms the methods using both hand-crafted and deep learning features. In addition, ablation studies show that all four improvements contribute to the performance of our SSMV-BoW. Keywords Scene categorization . Bag of words . Scale-space features . Feature fusion . TF-IDF weighting
1 Introduction The exponential increase in the volume of visual data in recent years has made numerous realworld applications possible, such as content-based image retrieval and categorization [2, 27, 28, 42–45, 75]. Due to the diversity and complexity of the existing visual data, developing
* Davar Giveki [email protected]; [email protected]
1
Department of Computer Engineering, Malayer University, P. O. Box 65719-95863, Malayer, Iran
Multimedia Tools and Applications
effective machine learning (ML) algorithms is of high interest and demand. Therefore, a large number of ML algorithms have been proposed in the past two decades for handling the visual data challenges. Recent advances in the methods based on deep learning (DL), e.g., deep convolutional neural networks (CNNs) have largely improved the performance of the state-ofthe-art visual recognition algorithms in variety of computer vision tasks such as classification [12, 3, 25, 35, 60, 80], scene categorization [6, 13, 33, 72], face recognition [21], image restoration [68], pedestrian detection [62], and medical image analysis [54]. In spite of the promising results of the DL-based methods, multi-view learning (MVL) is a flourishing direction in the ML domain with strong theoretical underpinnings and great practical success [84]. MVL deals with the data by multiple distinct featu
Data Loading...