LSTM and multiple CNNs based event image classification

PDF / 2,151,956 Bytes
18 Pages / 439.37 x 666.142 pts Page_size
0 Downloads / 227 Views

LSTM and multiple CNNs based event image classification Peian Li 1,2 & Huadong Tang 2 & Jing Yu 2 & Wei Song 1,3 Received: 24 March 2020 / Revised: 26 September 2020 / Accepted: 10 November 2020 # Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract

Previous studies have demonstrated that complexity and variation of event images are the major challenges in event classification. We approach the problem through an integrated methodology by utilizing Long Short-Term Memory network (LSTM) to fuse multiple Convolutional Neural Networks (CNNs). To address the issue of complexity, we use three specific CNNs to extract the scene, object and human visual cues respectively. To reduce the semantic gap and utilize the complementarity of the features in different levels, we choose AlexNet and VGG-16 network as the basic structures, and concatenate their outputs of the first fully-connected layer and the second fully-connected layer. Considering the contextual correlations between visual cues, we arrange the concatenations of three CNNs in the sequence of scene, object and human as a whole and put into the LSTM network. Particularly for context, we crop the images into five blocks as input and an individual image is supplemented with contextual features due to the temporal characteristics of the LSTM. We evaluate our method on the Web Image Dataset for Event Recognition (WIDER), and the obtained results demonstrate the effectiveness of all the above points. Compared with the state-of-the-art methods, the proposed method gives a considerable way for improving the performance on event classification. Keywords Event classification . Convolutional neural networks . Long short-term memory . Feature combination . Context information

* Wei Song songwei@muc.edu.cn

1

School of Information Engineering, Minzu University of China, Beijing, China

2

School of Electronic Information and Engineering, Beijing Jiaotong University, Beijing, China

3

National Language Resource Monitoring and Research Center of Minority Languages, Minzu University of China, Beijing, China

Multimedia Tools and Applications

1 Introduction In the field of computer vision, image classification has always been one of the most remarkable topics, and has attracted extensive attention from researchers. Event categorization in still images is a very challenging problem because events involve multiple interacting characteristics, and the description of events is complicated as well as variable [2]. Generally, the concept of events is highly correlated with many other high-level visual cues. The content of the event image involves various visual information such as human, object and scene. To extract features from images for classification, Scale Invariant Feature Transform (SIFT) [24], Histogram of Oriented Gradient (HOG) [23] and other algorithms [16, 32, 40] were adopted to manually extract features in the early stage. However, these methods which based on hand-engineered features have poor generalization performance. Until 2012, Alex krizh

Data Loading...

LSTM and multiple CNNs based event image classification

Recommend Documents

Structural Application of Medical Image Report Based on Bi-CNNs-LSTM-CRF

Classification of Multiple Steganographic Algorithms Using Hierarchical CNNs and ResNets

An Image-Based Approach for Classification of Driving Behaviour Using CNNs

Aesthetic Image Classification Based on Multiple Kernel Learning

Classification of ECG Signals Based on LSTM and CNN

Entropy-Based Filter Selection in CNNs Applied to Text Classification

FTP of CNNs with Multiple Weights

Gastrointestinal tract classification using improved LSTM based CNN

A Novel Remote Sensing Image Classification Scheme Based on Data Fusion, Multiple Features and Ensemble Learning

Leaf image analysis-based crop diseases classification

Image classification-based brain tumour tissue segmentation

Image Classification