Multilayer Perceptrons: Architecture and Error Backpropagation
MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1 .
- PDF / 843,485 Bytes
- 44 Pages / 439.37 x 666.142 pts Page_size
- 20 Downloads / 205 Views
Multilayer Perceptrons: Architecture and Error Backpropagation
4.1 Introduction MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1. Assume that there are M layers, each having Jm , m = 1, . . . , M, nodes. The weights from the (m − 1)th layer to the mth layer are denoted by W(m−1) ; the bias, output and activation function of the ith neuron in the mth layer are, respectively, denoted as θi(m) , oi(m) , and φi(m) (·). An MLP trained with the BP algorithm is also called a BP network. MLP can be used for classification of linearly inseparable patterns and for function approximation. From Fig. 4.1, we have the following relations. Notice that a plus sign precedes the bias vector for easy presentation. For m = 2, . . . , M, and the pth example: ˆyp = op(M) , op(1) = xp ,
(4.1)
T net p(m) = W(m−1) op(m−1) + θ (m) ,
(4.2)
op(m) = φ(m) net p(m) ,
(4.3)
(m) (m) T where net p(m) = net p,1 , . . . , net p,J , W(m−1) is a Jm−1 -by-Jm matrix, op(m−1) = m (m−1) T (m) (m) T (m) o(m−1) , . . . , o , θ = θ , . . . , θ is the bias vector, and φ(m) (·) p,1 1 p,Jm−1 Jm
applies φi(m) (·) to the ith component of the vector within. All φi(m) (·) are typically selected to be the same sigmoidal function; one can also select all φi(m) (·) in the first M − 1 layers as the same sigmoidal function, and all φi(m) (·) in the Mth layer as another continuous yet differentiable function.
K.-L. Du and M. N. S. Swamy, Neural Networks and Statistical Learning, DOI: 10.1007/978-1-4471-5571-3_4, © Springer-Verlag London 2014
83
84
4 Multilayer Perceptrons: Architecture and Error Backpropagation (m)
x1
x2
W
φ ( ) θ(M)
(m 1)
(1)
Σ
W
o(2) 1
Σ
Σ
W
o1(m)
oJ(2) 2
(M 1)
Σ
o1(M) Σ
Σ
Σ
o (2M)
m
o(2 )
o(2) 2
xJ1
(M)
φ ( ) θ(m)
(2)
(2) φ ()θ
Σ
oJ(mm)
Σ
oJ(MM)
y1
y2
yJM
Fig. 4.1 Architecture of MLP
4.2 Universal Approximation MLP is a universal approximator. Its universal approximation capability stems from the nonlinearities used in the nodes. The universal approximation capability of fourlayer MLPs has been addressed in [44, 114]. It has been mathematically proved that a three-layer MLP using sigmoidal activation function can approximate any continuous multivariate function to any accuracy [22, 33, 43, 136]. Usually, a fourlayer network can approximate the target with fewer connection weights, but this may, however, introduce extra local minima [16, 114, 136]. Xiang et al. provided a geometrical interpretation of MLP on the basis of the special geometrical shape of the activation function. For the target function with a flat surface located in the domain, a small four-layer MLP can generate better results [136]. MLP is very efficient for function approximation in high-dimensional spaces. The error convergence rate of MLP is independent of the input dimensionality, while conventional linear regression methods suffer from th
Data Loading...