Multilayer Perceptrons: Architecture and Error Backpropagation
MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1 .
- PDF / 843,485 Bytes
- 44 Pages / 439.37 x 666.142 pts Page_size
- 20 Downloads / 226 Views
		    Multilayer Perceptrons: Architecture and Error Backpropagation
 
 4.1 Introduction MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1. Assume that there are M layers, each having Jm , m = 1, . . . , M, nodes. The weights from the (m − 1)th layer to the mth layer are denoted by W(m−1) ; the bias, output and activation function of the ith neuron in the mth layer are, respectively, denoted as θi(m) , oi(m) , and φi(m) (·). An MLP trained with the BP algorithm is also called a BP network. MLP can be used for classification of linearly inseparable patterns and for function approximation. From Fig. 4.1, we have the following relations. Notice that a plus sign precedes the bias vector for easy presentation. For m = 2, . . . , M, and the pth example: ˆyp = op(M) , op(1) = xp ,
 
 (4.1)
 
 T  net p(m) = W(m−1) op(m−1) + θ (m) ,
 
 (4.2)
 
   op(m) = φ(m) net p(m) ,
 
 (4.3)
 
   (m) (m) T where net p(m) = net p,1 , . . . , net p,J , W(m−1) is a Jm−1 -by-Jm matrix, op(m−1) = m     (m−1) T (m) (m) T (m) o(m−1) , . . . , o , θ = θ , . . . , θ is the bias vector, and φ(m) (·) p,1 1 p,Jm−1 Jm
 
 applies φi(m) (·) to the ith component of the vector within. All φi(m) (·) are typically selected to be the same sigmoidal function; one can also select all φi(m) (·) in the first M − 1 layers as the same sigmoidal function, and all φi(m) (·) in the Mth layer as another continuous yet differentiable function.
 
 K.-L. Du and M. N. S. Swamy, Neural Networks and Statistical Learning, DOI: 10.1007/978-1-4471-5571-3_4, © Springer-Verlag London 2014
 
 83
 
 84
 
 4 Multilayer Perceptrons: Architecture and Error Backpropagation (m)
 
 x1
 
 x2
 
 W
 
 φ ( ) θ(M)
 
 (m 1)
 
 (1)
 
 Σ
 
 W
 
 o(2) 1
 
 Σ
 
 Σ
 
 W
 
 o1(m)
 
 oJ(2) 2
 
 (M 1)
 
 Σ
 
 o1(M) Σ
 
 Σ
 
 Σ
 
 o (2M)
 
 m
 
 o(2 )
 
 o(2) 2
 
 xJ1
 
 (M)
 
 φ ( ) θ(m)
 
 (2)
 
 (2) φ ()θ
 
 Σ
 
 oJ(mm)
 
 Σ
 
 oJ(MM)
 
 y1
 
 y2
 
 yJM
 
 Fig. 4.1 Architecture of MLP
 
 4.2 Universal Approximation MLP is a universal approximator. Its universal approximation capability stems from the nonlinearities used in the nodes. The universal approximation capability of fourlayer MLPs has been addressed in [44, 114]. It has been mathematically proved that a three-layer MLP using sigmoidal activation function can approximate any continuous multivariate function to any accuracy [22, 33, 43, 136]. Usually, a fourlayer network can approximate the target with fewer connection weights, but this may, however, introduce extra local minima [16, 114, 136]. Xiang et al. provided a geometrical interpretation of MLP on the basis of the special geometrical shape of the activation function. For the target function with a flat surface located in the domain, a small four-layer MLP can generate better results [136]. MLP is very efficient for function approximation in high-dimensional spaces. The error convergence rate of MLP is independent of the input dimensionality, while conventional linear regression methods suffer from th		
Data Loading...
 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	 
	