Improved analysis of spectral algorithm for clustering

  • PDF / 381,905 Bytes
  • 23 Pages / 439.37 x 666.142 pts Page_size
  • 5 Downloads / 274 Views

DOWNLOAD

REPORT


Improved analysis of spectral algorithm for clustering Tomohiko Mizutani1 Received: 4 December 2019 / Accepted: 28 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Spectral algorithms are graph partitioning algorithms that partition a node set of a graph into groups by using a spectral embedding map. Clustering techniques based on the algorithms are referred to as spectral clustering and are widely used in data analysis. To gain a better understanding of why spectral clustering is successful, Peng et al. (In: Proceedings of the 28th conference on learning theory (COLT), vol 40, pp 1423–1455, 2015) and Kolev and Mehlhorn (In: 24th annual European symposium on algorithms (ESA 2016), vol 57, pp 57:1–57:14, 2016) studied the behavior of a certain type of spectral algorithm for a class of graphs, called well-clustered graphs. Specifically, they put an assumption on graphs and showed the performance guarantee of the spectral algorithm under it. The algorithm they studied used the spectral embedding map developed by Shi and Malik (IEEE Trans Pattern Anal Mach Intell 22(8):888–905, 2000). In this paper, we improve on their results, giving a better performance guarantee under a weaker assumption. We also evaluate the performance of the spectral algorithm with the spectral embedding map developed by Ng et al. (In: Advances in neural information processing systems 14 (NIPS), pp 849–856, 2001). Keywords Spectral algorithm · Clustering · Graph partitioning · Performance guarantee

1 Introduction Spectral algorithms are algorithms for partitioning a node set of a graph into groups. They work well at clustering data and are often used for that purpose in data analysis. The clustering techniques based on them are referred to as spectral clustering. Here, we will consider a graph representing relationships between items; i.e., each node corresponds to an item and edges only connect pairs of similar nodes. The clustering task is to partition the node set into groups such that nodes in the same group are well

B 1

Tomohiko Mizutani [email protected] Department of Mathematical and Systems Engineering, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu City 432-8561, Japan

123

T. Mizutani

connected, while nodes in different groups are poorly connected. Spectral algorithms tend to produce such node groups in practice. Spectral algorithms work as follows. The input is a graph and a positive integer k. Two steps are performed on the input: (1) a spectral embedding map F is constructed that maps the nodes of a graph to points in a k-dimensional real space; then, a set X of points F(v) is formed for nodes v; (2) a k-way partition of the node set is found by applying a classical clustering algorithm to X (the output is the node partition). The algorithms usually employ either of two spectral embedding maps, i.e., the one of Shi and Malik [11] or the one of Ng et al. [8], in the first step and k-means clustering in the second step. Interested readers should see the tutorial [12] on spectral cl