Towards Effective Classification of Imbalanced Data with Convolutional Neural Networks

Class imbalance in machine learning is a problem often found with real-world data, where data from one class clearly dominates the dataset. Most neural network classifiers fail to learn to classify such datasets correctly if class-to-class separability is

PDF / 1,130,859 Bytes
13 Pages / 439.37 x 666.142 pts Page_size
78 Downloads / 231 Views

DOWNLOAD

REPORT

stract. Class imbalance in machine learning is a problem often found with real-world data, where data from one class clearly dominates the dataset. Most neural network classiﬁers fail to learn to classify such datasets correctly if class-to-class separability is poor due to a strong bias towards the majority class. In this paper we present an algorithmic solution, integrating diﬀerent methods into a novel approach using a class-to-class separability score, to increase performance on poorly separable, imbalanced datasets using Cost Sensitive Neural Networks. We compare diﬀerent cost functions and methods that can be used for training Convolutional Neural Networks on a highly imbalanced dataset of multi-channel time series data. Results show that, despite being imbalanced and poorly separable, performance metrics such as G-Mean as high as 92.8 % could be reached by using cost sensitive Convolutional Neural Networks to detect patterns and correctly classify time series from 3 diﬀerent datasets.

1

Introduction

In supervised classiﬁcation tasks, eﬀective learning happens when there are sufﬁcient examples for all the classes and class-to-class (C2C) separability is suﬃciently large. However, real world datasets are often imbalanced and have poor C2C separability. A dataset is said to be imbalanced when a certain class is overrepresented compared to other classes in that dataset. In binary classiﬁcation tasks, the class with too many examples is often referred to as the majority class, the other as the minority class respectively. Machine Learning algorithms performing classiﬁcation on such datasets face the so-called ‘class imbalance problem’, where learning is not as eﬀective as it is with a balanced dataset [6,10,13], since it poses a bias in learning towards the majority class. On the one hand, many of the real world datasets are imbalanced and on the other hand, most existing classiﬁcation approaches assume that the underlying training set is evenly distributed. Furthermore, in many scenarios it is undesirable or dangerous to misclassify an example from a minority class. For example, in a continuous surveillance task, suspicious activity may occur as a rare event which is undesirable to go unnoticed by the monitoring system. In medical applications, the cost of erroneously classifying a sick person as healthy c Springer International Publishing AG 2016 F. Schwenker et al. (Eds.): ANNPR 2016, LNAI 9896, pp. 150–162, 2016. DOI: 10.1007/978-3-319-46182-3 13

Eﬀective Classiﬁcation of Imbalanced Data with CNNs

151

can have larger risk (cost) than wrongly classifying a healthy person as sick. In these cases it is crucial for classiﬁcation algorithms to have a higher identiﬁcation rate for rare events, that means it is critical to not misclassify any minority examples while it is acceptable to misclassify few majority examples. An extreme example for the imbalance problem would be a dataset where the area of the majority class overlaps that of the minority class completely and the overlapping region contains as many

Data Loading...

Towards Effective Classification of Imbalanced Data with Convolutional Neural Networks

Recommend Documents

Classification of Musculoskeletal Abnormalities with Convolutional Neural Networks

Convolutional Neural Networks and Texture Classification

Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks

Skin melanoma classification using ROI and data augmentation with deep convolutional neural networks

Automated Binary Classification of Diabetic Retinopathy by Convolutional Neural Networks

An Evaluation of Convolutional Neural Networks for Malware Family Classification

Convolutional Neural Networks

Towards a Universal Steganalyser Using Convolutional Neural Networks

Recognizing handwritten digits with convolutional neural networks

Entity-Based Short Text Classification Using Convolutional Neural Networks

Smartphone-based bulky waste classification using convolutional neural networks

Traffic Sign Detection with Convolutional Neural Networks