Output Layer Multiplication for Class Imbalance Problem in Convolutional Neural Networks

  • PDF / 873,153 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 24 Downloads / 173 Views

DOWNLOAD

REPORT


Output Layer Multiplication for Class Imbalance Problem in Convolutional Neural Networks Zhao Yang1 · Yuanxin Zhu1 · Tie Liu1 · Sai Zhao1 · Yunyan Wang2 · Dapeng Tao3 Accepted: 4 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Convolutional neural networks (CNNs) have demonstrated remarkable performance in the field of computer vision. However, they are prone to suffer from the class imbalance problem, in which the number of some classes is significantly higher or lower than that of other classes. Commonly, there are two main strategies to handle the problem, including dataset-level methods via resampling and algorithmic-level methods by modifying the existing learning frameworks. However, most of these methods need extra data resampling or elaborate algorithm design. In this work we provide an effective but extremely simple approach to tackle the imbalance problem in CNNs with cross-entropy loss. Specifically, we multiply a coefficient α > 1 to output of the last layer in a CNN model. With this modification, the final loss function can dynamically adjust the contributions of examples from different classes during the imbalanced training procedure. Because of its simplicity, the proposed method can be easily applied in the off-the-shelf models with little change. To prove the effectiveness on imbalance problem, we design three experiments on classification tasks of increasing complexity. The experimental results show that our approach could improve the convergence rate in the training stage and/or increase accuracy for test. Keywords Convolutional neural networks · Imbalance learning · Output layer multiplication

1 Introduction Convolutional neural networks (CNNs) have obtained increasing attention in the computer vision community, due to the state-of-the-art performance in kinds of vision problems. It

B

Dapeng Tao [email protected]

1

School of Mechanical and Electric Engineering, Guangzhou University, Guangzhou, People’s Republic of China

2

School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan, People’s Republic of China

3

School of Information Science and Engineering, Yunnan University, Kunming, People’s Republic of China

123

Z. Yang et al.

includes tasks such as image classification [21, 29], object detection [36, 37], semantic segmentation [9, 38], and so on. Despite the success, it has been shown that CNNs are prone to the class imbalance problem [5], which exists in various practical applications. For instance, in object detection tasks [34, 39, 40], there is an inevitable foreground–background class imbalance, because the vast majority of the bounding boxes are labeled as background and few boxes contain specific objects. In facial attribute recognition [12, 24, 30], there is a severe imbalance between different attributes, as most of the datasets are drawn from the Internet using search engines without elaborate manual selections. Besides, many real-word datasets [10, 11, 32, 49, 56] exhibit roughly imbalanced or sk