iMass : an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity

PDF / 509,315 Bytes
3 Pages / 612.284 x 802.205 pts Page_size
27 Downloads / 357 Views

iMass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity Panthadeep BHATTACHARJEE

, Pinaki MITRA

Department of Computer Science and Engineering, Indian Institute of Technology, Guwahati 781039, India c Higher Education Press 2020

1

Research goals

The primary goal of this work is to propose an approximate incremental solution to MBSCAN [1] known as the iMass clustering algorithm for processing point-wise insertions dynamically (Table 1, Fig. 1). MBSCAN acts as a robust baseline because it replaces distance based density measure of DBSCAN [2] by a mass-based [3, 4] approach.

2

Background

The MBSCAN algorithm comprises of the following components: iForest, mass based dissimilarity matrix(mass-matrix), μ-neighborhood mass, core and non-core points, clusters and outliers. The iForest combines multiple iT rees, each being a binary tree representing a particular hierarchical partitioning model Hi , i = 1, 2, 3, ..., t, where t denotes the maximum numTable 1 Dataset

Libras

Segment

Wine

S1

ber of iT rees. The following Eq. (1) defines the mass of a region containing points a and b:

Mr (a, b|H; D) =

1(c ∈ r),

where r is any region, D is the dataset. ∀a, b ∈ D, we have Eq. (2) defining the mass of smallest local region [1] containing a and b: R(a, b|H; D) = argminr⊂H

s.t. {a,b}∈r

1(c ∈ r).

241 270 300 320 360 1541 1600 2000 2200 2310 119 125 155 170 178 601 650 700 800 900

MBSCAN/s 13 14 14 14 16 7 7 7 7 7 14 14 13 14 14 14 14 14 14 14

Received April 3, 2019; accepted December 12, 2019 E-mail: [email protected]; [email protected]

iForest iMass/s 0.00258 0.002525 0.002359 0.00317 0.001829 0.00196 0.00065 0.00067 0.00068 0.00068 0.0015 0.00207 0.00162 0.00150 0.002 0.00218 0.00182 0.00181 0.00199 0.00337

(2)

c∈D

Mass based dissimilarity [1] of a and b (Eq. (3)) is defined as the expected probability of R(a,b | H;D):

Eﬃciency achieved by building iForest and mass-matrix incrementally due to iMass. Datasets (See UCI Machine Learning Repository) D

(1)

r⊆H s.t. {a,b}∈r c∈D

Reduction/% 99.98 99.98 99.98 99.97 99.98 99.97 99.99 99.99 99.99 99.99 99.98 99.98 99.98 99.98 99.98 99.98 99.98 99.98 99.98 99.97

MBSCAN/s 4.4741 5.1014 6.8980 10.1717 16.5059 444.12 604.434 769.443 948.945 1189.24 0.5356 0.6408 1.3750 1.6561 2.057 49.102 79.1108 105.37 134.963 193.92

Mass-matrix iMass/s 0.4009 0.4953 0.6208 0.7097 0.9130 13.6027 14.5633 22.516 27.3928 30.1611 0.0985 0.1079 0.1701 0.2021 0.2134 2.6358 3.1398 3.69813 5.0534 6.3625

Reduction/% 91.03 90.28 90.99 93.02 94.46 96.93 97.59 97.07 97.11 97.46 81.60 83.15 86.96 87.79 89.62 94.63 96.03 96.49 96.25 96.71

2

Front. Comput. Sci., 2021, 15(2): 152314

Fig. 1 Libras and Segment dataset evaluation. (a) CPU execution time for iMass and MBSCAN (Libras dataset); (b) Percentage of aﬀected nodes from iForest post insertion (Libras dataset); (c) CPU execution time for iMass and MBSCAN (Segment dataset); (d) Percentage of aﬀected nodes from iForest post insertion (Segment dataset)

me (a, b|H; D) = EH(D) [PF

Data Loading...

iMass : an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity

Recommend Documents

An Approximate Algorithm for Robust Adaptive Beamforming

An improved density-based adaptive p -spectral clustering algorithm

Adaptive multi-resolution graph-based clustering algorithm for electrofacies analysis

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Data clustering using multivariant optimization algorithm

A Data Clustering Algorithm Using Cuckoo Search

An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering

An Evolutionary Algorithm for Adaptive Online Services in Dynamic Environment

Approximate Clustering of Time Series Using Compact Model-Based Descriptions

A multi-agent-based algorithm for data clustering

An Adaptive Fuzzy Clustering Algorithm Based on Multi-threshold for Infrared Image Segmentation

An effective method using clustering-based adaptive decomposition and editing-based diversified oversamping for multi-cl