Hubness-based fuzzy measures for high-dimensional k -nearest neighbor classification

PDF / 994,123 Bytes
14 Pages / 595.276 x 790.866 pts Page_size
23 Downloads / 253 Views

ORIGINAL ARTICLE

Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification Nenad Tomasˇev • Milosˇ Radovanovic´ Dunja Mladenic´ • Mirjana Ivanovic´

•

Received: 9 March 2012 / Accepted: 19 November 2012 Springer-Verlag Berlin Heidelberg 2012

Abstract Most data of interest today in data-mining applications is complex and is usually represented by many different features. Such high-dimensional data is by its very nature often quite difficult to handle by conventional machine-learning algorithms. This is considered to be an aspect of the well known curse of dimensionality. Consequently, high-dimensional data needs to be processed with care, which is why the design of machine-learning algorithms needs to take these factors into account. Furthermore, it was observed that some of the arising highdimensional properties could in fact be exploited in improving overall algorithm design. One such phenomenon, related to nearest-neighbor learning methods, is known as hubness and refers to the emergence of very influential nodes (hubs) in k-nearest neighbor graphs. A crisp weighted voting scheme for the k-nearest neighbor classifier has recently been proposed which exploits this notion. In this paper we go a step further by embracing the This is an extended version of the paper Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification, which was presented at the MLDM 2011 conference [27]. N. Tomasˇev (&) D. Mladenic´ Institute Jozˇef Stefan, Artificial Intelligence Laboratory, Jozˇef Stefan International Postgraduate School, Jamova 39, 1000 Ljubljana, Slovenia e-mail: [email protected] D. Mladenic´ e-mail: [email protected] M. Radovanovic´ M. Ivanovic´ Department of Mathematics and Informatics, University of Novi Sad, Trg D. Obradovic´a 4, 21000 Novi Sad, Serbia e-mail: [email protected] M. Ivanovic´ e-mail: [email protected]

soft approach, and propose several fuzzy measures for k-nearest neighbor classification, all based on hubness, which express fuzziness of elements appearing in k-neighborhoods of other points. Experimental evaluation on real data from the UCI repository and the image domain suggests that the fuzzy approach provides a useful measure of confidence in the predicted labels, resulting in improvement over the crisp weighted method, as well as the standard kNN classifier. Keywords Classification k-nearest neighbor Fuzzy Hubs Curse of dimensionality

1 Introduction High-dimensional data is ubiquitous in modern applications. It arises naturally when dealing with text, images, audio, data streams, medical records, etc. The impact of this high dimensionality is manyfold. It is a well known fact that many machine-learning algorithms are plagued by what is usually termed the curse of dimensionality. This comprises a set of properties that tend to become more pronounced as the dimensionality of data increases. First and foremost is the unavoidable sparsity of data. In highdimensional spaces all data is sparse, meaning that there is not enou

Data Loading...

Hubness-based fuzzy measures for high-dimensional k -nearest neighbor classification

Recommend Documents

Performance Analysis of Nearest Neighbor, K-Nearest Neighbor and Weighted K-Nearest Neighbor for the Classification of A

K-Nearest Neighbor Queries Over Encrypted Data

Using a Genetic Algorithm for Editing k-Nearest Neighbor Classifiers

A Weighted Combination Method of Multiple K-Nearest Neighbor Classifiers for EEG-Based Cognitive Task Classification

Text Classification Using K-Nearest Neighbor Algorithm and Firefly Algorithm for Text Feature Selection

Towards enriching the quality of k -nearest neighbor rule for document classification

Nearest Neighbor

Reverse Nearest Neighbor Search

Nearest Neighbor Query

Efficient and secure k -nearest neighbor query on outsourced data

Robust Earthquake Cluster Analysis Based on K-Nearest Neighbor Search

Text document classification using fuzzy rough set based on robust nearest neighbor (FRS-RNN)