An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks

PDF / 2,162,936 Bytes
31 Pages / 439.37 x 666.142 pts Page_size
32 Downloads / 213 Views

An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks Cody Dennis1 · Andries P. Engelbrecht2 · Beatrice M. Ombuki-Berman1

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract The activation functions used in an artificial neural network define how nodes of the network respond to input, directly influence the shape of the error surface and play a role in the difficulty of the neural network training problem. Choice of activation functions is a significant question which must be addressed when applying a neural network to a problem. One issue which must be considered when selecting an activation function is known as activation function saturation. Saturation occurs when a bounded activation function primarily outputs values close to its boundary. Excessive saturation damages the network’s ability to encode information and may prevent successful training. Common functions such as the logistic and hyperbolic tangent functions have been shown to exhibit saturation when the neural network is trained using particle swarm optimization. This study proposes a new measure of activation function saturation, evaluates the saturation behavior of eight common activation functions, and evaluates six measures of controlling activation function saturation in particle swarm optimization based neural network training. Activation functions that result in low levels of saturation are identified. For each activation function recommendations are made regarding which saturation control mechanism is most effective at reducing saturation. Keywords Feed forward neural network · Particle swarm optimization · Saturation · Activation function

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11063-02010290-z) contains supplementary material, which is available to authorized users.

B

Beatrice M. Ombuki-Berman [email protected] Cody Dennis [email protected] Andries P. Engelbrecht [email protected]

1

Department of Computer Science, Brock University, St. Catharines, Canada

2

Department of Industrial Engineering and Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa

123

C. Dennis et al.

1 Introduction A feedforward neural network (FFNN) is a simple model inspired by the mammalian brain [2]. A corpus of known information can be encoded within these models. The process of encoding information within the model is known as training the network. During training, the network can identify patterns and relationships in the data, which gives the network the ability to generalize. After successful training, the network is able to produce an appropriate response to novel inputs [48]. This ability has made the use of FFNNs popular for many real world problems in fields such as forecasting, medicine, finance, and others [38]. Gradient descent [50] is perhaps the most commonly used algorithm for training FFNNs [1,2,54]. However, a variety of works have shown that particle swarm optimization (PSO) [26] is an

Data Loading...

An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks

Recommend Documents

An Improved Discrete Particle Swarm Optimization Algorithm

Performance improvement of intrusion detection system using neural networks and particle swarm optimization algorithms

Performance Analysis of Turning Process via Particle Swarm Optimization

Adaptive Balance Factor in Particle Swarm Optimization

Optimization of APTEEN Routing Protocol in Wireless Sensor Networks Based on Particle Swarm Optimization

An Improved Particle Swarm Optimization Algorithm for Option Pricing

Optimization Solutions Using Particle Swarm Optimization in Power Systems

Fractional Order Darwinian Particle Swarm Optimization Applications

Headless Chicken Particle Swarm Optimization Algorithms

Particle Swarm Optimization for Milling Titanium Alloy

An Analysis of K-Means, Particle Swarm Optimization and Genetic Algorithm with Data Clustering Technique

Simple Electromagnetic Analysis Against Activation Functions of Deep Neural Networks