Multilayer Perceptrons: Architecture and Error Backpropagation

MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1 .

  • PDF / 843,485 Bytes
  • 44 Pages / 439.37 x 666.142 pts Page_size
  • 20 Downloads / 205 Views

DOWNLOAD

REPORT


Multilayer Perceptrons: Architecture and Error Backpropagation

4.1 Introduction MLPs are feedforward networks with one or more layers of units between the input and output layers. The output units represent a hyperplane in the space of the input patterns. The architecture of MLP is illustrated in Fig. 4.1. Assume that there are M layers, each having Jm , m = 1, . . . , M, nodes. The weights from the (m − 1)th layer to the mth layer are denoted by W(m−1) ; the bias, output and activation function of the ith neuron in the mth layer are, respectively, denoted as θi(m) , oi(m) , and φi(m) (·). An MLP trained with the BP algorithm is also called a BP network. MLP can be used for classification of linearly inseparable patterns and for function approximation. From Fig. 4.1, we have the following relations. Notice that a plus sign precedes the bias vector for easy presentation. For m = 2, . . . , M, and the pth example: ˆyp = op(M) , op(1) = xp ,

(4.1)

T  net p(m) = W(m−1) op(m−1) + θ (m) ,

(4.2)

  op(m) = φ(m) net p(m) ,

(4.3)

  (m) (m) T where net p(m) = net p,1 , . . . , net p,J , W(m−1) is a Jm−1 -by-Jm matrix, op(m−1) = m     (m−1) T (m) (m) T (m) o(m−1) , . . . , o , θ = θ , . . . , θ is the bias vector, and φ(m) (·) p,1 1 p,Jm−1 Jm

applies φi(m) (·) to the ith component of the vector within. All φi(m) (·) are typically selected to be the same sigmoidal function; one can also select all φi(m) (·) in the first M − 1 layers as the same sigmoidal function, and all φi(m) (·) in the Mth layer as another continuous yet differentiable function.

K.-L. Du and M. N. S. Swamy, Neural Networks and Statistical Learning, DOI: 10.1007/978-1-4471-5571-3_4, © Springer-Verlag London 2014

83

84

4 Multilayer Perceptrons: Architecture and Error Backpropagation (m)

x1

x2

W

φ ( ) θ(M)

(m 1)

(1)

Σ

W

o(2) 1

Σ

Σ

W

o1(m)

oJ(2) 2

(M 1)

Σ

o1(M) Σ

Σ

Σ

o (2M)

m

o(2 )

o(2) 2

xJ1

(M)

φ ( ) θ(m)

(2)

(2) φ ()θ

Σ

oJ(mm)

Σ

oJ(MM)

y1

y2

yJM

Fig. 4.1 Architecture of MLP

4.2 Universal Approximation MLP is a universal approximator. Its universal approximation capability stems from the nonlinearities used in the nodes. The universal approximation capability of fourlayer MLPs has been addressed in [44, 114]. It has been mathematically proved that a three-layer MLP using sigmoidal activation function can approximate any continuous multivariate function to any accuracy [22, 33, 43, 136]. Usually, a fourlayer network can approximate the target with fewer connection weights, but this may, however, introduce extra local minima [16, 114, 136]. Xiang et al. provided a geometrical interpretation of MLP on the basis of the special geometrical shape of the activation function. For the target function with a flat surface located in the domain, a small four-layer MLP can generate better results [136]. MLP is very efficient for function approximation in high-dimensional spaces. The error convergence rate of MLP is independent of the input dimensionality, while conventional linear regression methods suffer from th