Detecting and Recognizing Outliers in Datasets via Linguistic Information and Type-2 Fuzzy Logic

PDF / 725,515 Bytes
12 Pages / 595.276 x 790.866 pts Page_size
85 Downloads / 278 Views

Detecting and Recognizing Outliers in Datasets via Linguistic Information and Type-2 Fuzzy Logic Adam Niewiadomski1 • Agnieszka Duraj1

Received: 25 July 2019 / Revised: 1 February 2020 / Accepted: 11 July 2020 Ó The Author(s) 2020

Abstract Uncertainty appearing in datasets (stochastic, linguistic, of measurements, etc.), if not handled properly, may negatively affect information analysis or retrieval procedures. One of possible methods of dealing with uncertain (rare, strange, unexampled) data is to treat them as ‘‘outliers’’ or ‘‘exceptions’’. Among different definitions and algorithms for detecting outliers, we are especially interested in those based on linguistic information represented with type-2 fuzzy logic. We introduce new definitions of outliers in datasets in terms fuzzy properties and linguistically expressed quantities of objects possessing them. Next, new algorithms for detecting outlying objects are presented, to answer whether outliers appear in a dataset or not. Finally, recognition algorithms are presented and exemplified to enumerate particular objects being outliers (e.g., to eliminate them for further considerations). The novelty of this contribution is that we define, detect and recognize outliers using linguistic information represented mostly by type-2 fuzzy sets and logic (if any other information like measures or distances is not accessible), and we supersede this way some earlier approaches based on similar but relatively limited assumptions. Keywords Outliers in datasets Detecting outliers Recognizing outliers Outliers defined via linguistic information Type-2 linguistic quantification Type-2 fuzzy logic

& Adam Niewiadomski [email protected] 1

Institute of Information Technology, Lodz University of Technology, ul. Wo´lczan´ska 215, 90-924 Ło´dz´, Poland

1 Introduction Although it sounds like a truism, currently, an intensive development of data analysis methods applied to classification, grouping, machine learning, etc. is noticable. These methods refer to various tasks selected and targeted to purposes of different systems. What must be pointed out here is that in collecting and processing data, frequently from unknown sources, there is some uncertainty, mostly appearing as imprecise and/or incomplete information. Sources of uncertainty are commonly measurements, probability methods (stochastic uncertainty), lack of credibility of information (information uncertainty), and phenomena imprecise descriptions in natural language (linguistic uncertainty). One of manners of handling uncertain data is to look at them as at outliers. An ‘‘outlier’’ or ‘‘exception’’ (also anomaly, deviation, abnormality, aberration, etc.) in a natural language means something unique, rare, infrequent, special, specific, sensational, or unexampled. These terms suggest that some features of objects, situations, or phenomena are unobvious or unusual to recipients considering/observing them. Outliers, if occur, are especially possible to be noticed as highlighted or differing on a background

Data Loading...

Detecting and Recognizing Outliers in Datasets via Linguistic Information and Type-2 Fuzzy Logic

Recommend Documents

Linguistic Fuzzy Logic Methods in Social Sciences

Outliers Detection in Multi-label Datasets

Intelligent Sentiments Information Systems Using Fuzzy Logic

Fuzzy Logic and Information Fusion To commemorate the 70th birthday

Bus body manufacturing system via FEMA and fuzzy logic controller

Mathematics of Fuzzy Sets and Fuzzy Logic

Fuzzy/Linguistic IF-THEN Rules and Linguistic Descriptions

Fuzzy Logic and the Internet

Logic, Language, and Information

Fuzzy Logic in Medicine

Artificial Intelligence and Fuzzy Logic

Fuzzy Clustering of High Dimensional Data with Noise and Outliers