Convergence of Batch Gradient Method Based on the Entropy Error Function for Feedforward Neural Networks

  • PDF / 421,026 Bytes
  • 9 Pages / 439.37 x 666.142 pts Page_size
  • 70 Downloads / 200 Views

DOWNLOAD

REPORT


Convergence of Batch Gradient Method Based on the Entropy Error Function for Feedforward Neural Networks Yan Xiong1

· Xin Tong2

Accepted: 8 October 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Gradient method is often used for the feedforward neural network training. Most of the studies so far have been focused on the square error function. In this paper, a novel entropy error function is proposed for the feedforward neural network training. The week and strong convergence analysis of the gradient method based on the entropy error function with batch input training patterns is strictly proved. Numerical examples are also given by the end of the paper for verifying the effectiveness and correctness. Compared with the square error function, our method provides both faster learning speed and better generalization for the given test problems. Keywords Gradient method · Batch input training · Entropy error function · Convergence · Monotonicity

1 Introduction In the past two decades, the feedforward neural network is widely applied in function approximation, prediction and data classification, which has played an important role in research on neural network theory [1–4], modeling and control of nonlinear systems [5–7]. Some optimization techniques have been used in the weight learning process of feedforward neural networks, among which the gradient method is a simple and widely used method for the feedforward neural network training. Generally speaking, the training process of the gradient method is done by iteratively updating of the weights based on the negative gradient of the error function, and minimizes the error value through repeated training in order to obtain the output and input relationship of a network. The square error function is usually selected as the error function for feedforward neural networks. There have been many thorough results on the convergence and stability of the gradient method based on the square error function in

B

Yan Xiong [email protected] Xin Tong [email protected]

1

Faculty of Science, University of Science and Technology Liaoning, 114051 Anshan, China

2

Yingkou Institute of Technology, 115001 Yingkou, China

123

Y. Xiong, X. Tong

e.g. [8–12] among many others. However, the square error surface of a complex network is highly convoluted, full of hills and valleys in high-dimensional space. The network training can lead to a slower convergence rate and easily trap in a local minimum or even with the incorrect saturation problem in practice. Thus it may be known that the selection of the error functions has great influence on the networks performance. Different error functions will correspond with different gradient functions in training samples. It is the key factor to determine the convergence, stability and classification accuracy of a neural network [18]. Some studies began to focus on other error functions. The entropy error function was originally proposed by Karayiann [13] in 1992 and further modified by Oh [14] in 1997. Oh came to the