k-Morik: Mining Patterns to Classify Cartified Images of Katharina

When building traditional Bag of Visual Words (BOW) for image classification, the k-Means algorithm is usually used on a large set of high dimensional local descriptors to build a visual dictionary. However, it is very likely that, to find a good visual v

  • PDF / 922,860 Bytes
  • 9 Pages / 439.37 x 666.142 pts Page_size
  • 77 Downloads / 172 Views

DOWNLOAD

REPORT


UMR CNRS 5516, Laboratoire Hubert-Curien, Universit´e de Lyon, Universit´e de St-Etienne, 42000 St-Etienne, France [email protected] 2 Department of Math and Computer Science, University of Antwerp, Antwerp, Belgium [email protected]

Abstract. When building traditional Bag of Visual Words (BOW) for image classification, the k-Means algorithm is usually used on a large set of high dimensional local descriptors to build a visual dictionary. However, it is very likely that, to find a good visual vocabulary, only a sub-part of the descriptor space of each visual word is truly relevant for a given classification problem. In this paper, we explore a novel framework for creating a visual dictionary based on Cartification and Pattern Mining instead of the traditional k-Means algorithm. Preliminary experimental results on face images show that our method is able to successfully differentiate photos of Elisa Fromont, and Bart Goethals from Katharina Morik.

1

Introduction

Classification of images is of considerable interest in many image processing and computer vision applications. A common approach to represent the image content is to use histograms of color, texture and edge direction features [8,29]. Although they are computationally efficient, such histograms only use global information and thus only provide a crude representation of the image content. One trend in image classification is towards the use of bag-of-visual-words (BOW) features [11] that come from the bag-of-words representation of text documents [27]. The creation of these features requires four basic steps: (i) keypoints detection (ii) keypoints description, (iii) codebook creation and (iv) image representation. Keypoints refer to small regions of interest in the image. They can be sampled densely [16], randomly [31] or extracted with various detectors [21] commonly used in computer vision. Once extracted, the keypoints are characterized using a local descriptor which encodes a small region of the image in a D-dimensional vector. The most widely used keypoint descriptor is the 128-dimensional SIFT descriptor [20]. Once the keypoints are described, the collection of descriptors of all images of a training set are clustered, often using the k-Means algorithm, to obtain a visual codebook. Each cluster representative (typically the centroid) is considered as a visual word in a visual dictionary c Springer International Publishing Switzerland 2016  S. Michaelis et al. (Eds.): Morik Festschrift, LNAI 9580, pp. 377–385, 2016. DOI: 10.1007/978-3-319-41706-6 21

378

E. Fromont and B. Goethals

and each image can be mapped into this new space of visual words leading to a bag-of-visual-words (or a histogram of visual words) representation. The kMeans algorithm considers all the feature dimensions (128 for SIFT descriptors) for computing the distances to estimate clusters. As a consequence, the nearest neighbor estimation can get affected by noisy information from irrelevant feature dimensions [6,18]. Furthermore, the k-Means algorithm forces every keypoi