Strategies for Multidimensional Data Visualization

In this chapter, an analytical review of methods for multidimensional data visualization is presented. The methods based on direct visualization and projections are described. Some quantitative criteria of the visualization quality are also introduced.

  • PDF / 2,353,991 Bytes
  • 36 Pages / 439.36 x 666.15 pts Page_size
  • 72 Downloads / 219 Views

DOWNLOAD

REPORT


Strategies for Multidimensional Data Visualization

In this chapter, an analytical review of methods for multidimensional data visualization is presented. The methods based on direct visualization and projections are described. Some quantitative criteria of the visualization quality are also introduced.

2.1 Direct Visualization The direct data visualization is a graphical presentation of the data set that provides a qualitative understanding of the information contents in a natural and direct way. The commonly used methods are scatter plot matrices, parallel coordinates, Andrews curves, Chernoff faces, stars, dimensional stacking, etc. [94]. The direct visualization methods do not have any defined formal mathematical criterion for estimating the visualization quality. Each of the features x1 , x2 , . . . , xn characterizing the object Xi = (xi1 , xi2 , . . . , xin ), i ∈ {1, . . . , m}, is represented in a visual form acceptable for a human being.

2.1.1 Geometric Methods Geometric visualization methods are the methods where multidimensional points are displayed using the axes of the selected geometric shape [187]. Scatter plots are one of the most commonly used techniques for data representation on a plane R2 or space R3 . Points are displayed in the classic (x, y) or (x, y, z) format [73, 74, 94]. Usually, the two-dimensional (n = 2) or three-dimensional (n = 3) points are represented by this technique. A two-dimensional example is shown in Fig. 2.1. Using a matrix of scatter plots, the scatter plots can be applied to visualize more higher dimensionality data. The matrix of scatter plots is an array of scatter plots G. Dzemyda et al., Multidimensional Data Visualization: Methods and Applications, Springer Optimization and Its Applications 75, DOI 10.1007/978-1-4419-0236-8 2, © Springer Science+Business Media, LLC 2013

5

6

2 Strategies for Multidimensional Data Visualization

Fig. 2.1 Scatter plot of two-dimensional points

1

0.8

y

0.6

0.4

0.2

0

0

0.2

0.4

x

0.6

0.8

1

displaying all possible pairwise combinations of features. If n-dimensional data are analyzed, the number of scatter plots is equal to n(n−1) 2 . In the diagonal of the matrix of scatter plots, a graphical statistical characteristic of each feature can be presented, for example, a histogram of values. The matrix of scatter plots is useful for observing all possible pairwise interactions between features [73]. The matrix of scatter plots of the Iris data is presented in Fig. 2.2 (see a description of the data set in Appendix A). We can see that Setosa flowers (blue) are significantly different from Versicolor (red) and Virginica (green). The scatter plots can also be positioned in a non-array format (circular, hexagonal, etc.). Some variations of scatter plot matrices are also developed [94]. In multiline graphs, we draw n curves (line graphs) that represent the features depending on the order number of objects [94]. An example for Auto MPG data is presented in Fig. 2.3 (see a description of the data set in Appendix A). Andrews curve