An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks
- PDF / 2,162,936 Bytes
- 31 Pages / 439.37 x 666.142 pts Page_size
- 32 Downloads / 195 Views
An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks Cody Dennis1 · Andries P. Engelbrecht2 · Beatrice M. Ombuki-Berman1
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract The activation functions used in an artificial neural network define how nodes of the network respond to input, directly influence the shape of the error surface and play a role in the difficulty of the neural network training problem. Choice of activation functions is a significant question which must be addressed when applying a neural network to a problem. One issue which must be considered when selecting an activation function is known as activation function saturation. Saturation occurs when a bounded activation function primarily outputs values close to its boundary. Excessive saturation damages the network’s ability to encode information and may prevent successful training. Common functions such as the logistic and hyperbolic tangent functions have been shown to exhibit saturation when the neural network is trained using particle swarm optimization. This study proposes a new measure of activation function saturation, evaluates the saturation behavior of eight common activation functions, and evaluates six measures of controlling activation function saturation in particle swarm optimization based neural network training. Activation functions that result in low levels of saturation are identified. For each activation function recommendations are made regarding which saturation control mechanism is most effective at reducing saturation. Keywords Feed forward neural network · Particle swarm optimization · Saturation · Activation function
Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11063-02010290-z) contains supplementary material, which is available to authorized users.
B
Beatrice M. Ombuki-Berman [email protected] Cody Dennis [email protected] Andries P. Engelbrecht [email protected]
1
Department of Computer Science, Brock University, St. Catharines, Canada
2
Department of Industrial Engineering and Department of Computer Science, Stellenbosch University, Stellenbosch, South Africa
123
C. Dennis et al.
1 Introduction A feedforward neural network (FFNN) is a simple model inspired by the mammalian brain [2]. A corpus of known information can be encoded within these models. The process of encoding information within the model is known as training the network. During training, the network can identify patterns and relationships in the data, which gives the network the ability to generalize. After successful training, the network is able to produce an appropriate response to novel inputs [48]. This ability has made the use of FFNNs popular for many real world problems in fields such as forecasting, medicine, finance, and others [38]. Gradient descent [50] is perhaps the most commonly used algorithm for training FFNNs [1,2,54]. However, a variety of works have shown that particle swarm optimization (PSO) [26] is an
Data Loading...