Visualization Support to Interactive Cluster Analysis

We demonstrate interactive visual embedding of partition-based clustering of multidimensional data using methods from the open-source machine learning library Weka. According to the visual analytics paradigm, knowledge is gradually built and refined by a

  • PDF / 771,280 Bytes
  • 4 Pages / 439.37 x 666.14 pts Page_size
  • 102 Downloads / 212 Views

DOWNLOAD

REPORT


Fraunhofer Institute IAIS, Sankt Augustin, Germany 2 City University London, London, UK {gennady.andrienko,natalia.andrienko}@iais.fraunhofer.de

Abstract. We demonstrate interactive visual embedding of partition-based clustering of multidimensional data using methods from the open-source machine learning library Weka. According to the visual analytics paradigm, knowledge is gradually built and refined by a human analyst through iterative application of clustering with different parameter settings and to different data subsets. To show clustering results to the analyst, cluster membership is typically represented by color coding. Our tools support the color consistency between different steps of the process. We shall demonstrate two-way clustering of spatial time series, in which clustering will be applied to places and to time steps.

1

Introduction

Our system V-Analytics [1] enables analytical workflows involving partition-based clustering by methods from an open-source library Weka [2] combined with interactive visualizations for effective human-computer data analysis and knowledge building. According to the visual analytics paradigm, knowledge is built and refined gradually by iterative application of analytical techniques, such as clustering, with different parameter settings and to different data subsets. A typical approach to visualizing clustering results is representing cluster membership on various data displays by color-coding [3-5]. To properly support a process involving iterative clustering, the colors assigned to the clusters need to be consistent between different steps. We have designed special color assignment techniques that keep the color consistency. When data are stored in a table, clustering can be applied to the table rows or to the columns [6]. Spatial time series, i.e., attribute values referring to different spatial locations and time steps, can be represented in a table with the rows corresponding to the locations and columns to the time steps. Two-way clustering groups the locations based on the similarity of the local temporal variations of the attribute values and the time steps based on the similarity of the spatial situations, i.e., the distributions of the attribute values over the set of locations [5].

2

Interactive Two-Way Cluster Analysis of Spatial Time Series

To support iterative data analysis and knowledge building with the use of clustering, V-Analytics provides the following functionality: © Springer International Publishing Switzerland 2015 A. Bifet et al. (Eds.): ECML PKDD 2015, Part III, LNAI 9286, pp. 337–340, 2015. DOI: 10.1007/978-3-319-23461-8_43

338

G. Andrienko and N. Andrienko

Fig. 1. Left: a projection of cluster centers onto a color plane; right: the spatial distribution of the cluster membership; center: a legend showing cluster colors and sizes.

Fig. 2. Top: the 2D time histograms (9 days x 24 hours) correspond to different clusters; the bars in the cells represent the cluster means. Bottom: the time graphs show the variations of the absolute (left) an