Machine Learning

Patterns are categorized into a number of classes. Pattern recognition is the problem of identifying the class to which a given pattern belongs.

  • PDF / 1,005,002 Bytes
  • 48 Pages / 439.37 x 666.142 pts Page_size
  • 94 Downloads / 343 Views

DOWNLOAD

REPORT


Machine Learning

11.1 Clustering Patterns Patterns are categorized into a number of classes. Pattern recognition is the problem of identifying the class to which a given pattern belongs. When a divergence is defined in the manifold of patterns, classification is brought into effect by using the divergence. We begin with the problem of obtaining a representative of a cluster of patterns, called the center of the cluster. When patterns are categorized into clusters, pattern recognition determines the cluster to which a given pattern is supposed to belong, based on the closeness due to the divergence. Another problem is to divide a non-labeled aggregate of patterns into a set of clusters. This is the problem of clustering. A generalized k-means method is presented by using a divergence. The entire pattern space is divided into regions based on representatives of clusters. Such a division is called a divergence-based Voronoi diagram. When patterns are generated randomly subject to a probability distribution depending on each class, we have a stochastic version of the above problems. Information geometry is useful for understanding these problems in terms of divergence.

11.1.1 Pattern Space and Divergence Let us consider patterns represented by vector  belong to a pattern manifold  x. They X . We study the case where a divergence D x : x  is defined between two patterns x and x  . In a Euclidean space, we have  1  2  xi − xi , D x : x = 2

(11.1)

which is a half of the square of the Euclidean distance. We consider a general dually flat manifold induced by a Bregman divergence. For the sake of notational © Springer Japan 2016 S. Amari, Information Geometry and Its Applications, Applied Mathematical Sciences 194, DOI 10.1007/978-4-431-55978-8_11

231

232

11 Machine Learning

convenience, we suppose that pattern x is represented in the dual affine coordinate system, so that it is represented by the η-coordinates as η = x.

(11.2)

Then, we use the dual divergence between two patterns x and x          Dφ x : x  = φ(x) − φ x  − ∇φ x  · x − x  ,

(11.3)

which is constructed from a dual convex function φ. We will later use the primal affine coordinate system θ given by the Legendre transformation ∂ φ(η). (11.4) θ = ∇φ(η) = ∂η The primal convex function ψ(θ) is given by the Legendre relation ψ(θ) = −φ(η) + θ · η,

(11.5)

η = ∇ψ(θ).

(11.6)

11.1.2 Center of Cluster Let C be a cluster consisting of k patterns x 1 , . . . , x k . We search for the representative of C which should be as close to all the members of C as possible (Fig. 11.1). To obtain such a representative, we calculate the average of the dual divergences from the members of the cluster to a point η, as Dφ [C : η] =

1 Dφ [x i : η] . k x ∈C

(11.7)

i

Fig. 11.1 φ-center of cluster C

.x .x

.x 1

.x

4

3

.

... xk

C 2

.x

5

φ -center ηc

11.1 Clustering Patterns

233

The minimizer of (11.7) is called the φ-center of cluster C due to the divergence Dφ . If we use the θ-coordinates, this is written as Dψ [θ : C] =

1 Dψ [θ :