Detecting and Recognizing Outliers in Datasets via Linguistic Information and Type-2 Fuzzy Logic
- PDF / 725,515 Bytes
- 12 Pages / 595.276 x 790.866 pts Page_size
- 85 Downloads / 183 Views
Detecting and Recognizing Outliers in Datasets via Linguistic Information and Type-2 Fuzzy Logic Adam Niewiadomski1 • Agnieszka Duraj1
Received: 25 July 2019 / Revised: 1 February 2020 / Accepted: 11 July 2020 Ó The Author(s) 2020
Abstract Uncertainty appearing in datasets (stochastic, linguistic, of measurements, etc.), if not handled properly, may negatively affect information analysis or retrieval procedures. One of possible methods of dealing with uncertain (rare, strange, unexampled) data is to treat them as ‘‘outliers’’ or ‘‘exceptions’’. Among different definitions and algorithms for detecting outliers, we are especially interested in those based on linguistic information represented with type-2 fuzzy logic. We introduce new definitions of outliers in datasets in terms fuzzy properties and linguistically expressed quantities of objects possessing them. Next, new algorithms for detecting outlying objects are presented, to answer whether outliers appear in a dataset or not. Finally, recognition algorithms are presented and exemplified to enumerate particular objects being outliers (e.g., to eliminate them for further considerations). The novelty of this contribution is that we define, detect and recognize outliers using linguistic information represented mostly by type-2 fuzzy sets and logic (if any other information like measures or distances is not accessible), and we supersede this way some earlier approaches based on similar but relatively limited assumptions. Keywords Outliers in datasets Detecting outliers Recognizing outliers Outliers defined via linguistic information Type-2 linguistic quantification Type-2 fuzzy logic
& Adam Niewiadomski [email protected] 1
Institute of Information Technology, Lodz University of Technology, ul. Wo´lczan´ska 215, 90-924 Ło´dz´, Poland
1 Introduction Although it sounds like a truism, currently, an intensive development of data analysis methods applied to classification, grouping, machine learning, etc. is noticable. These methods refer to various tasks selected and targeted to purposes of different systems. What must be pointed out here is that in collecting and processing data, frequently from unknown sources, there is some uncertainty, mostly appearing as imprecise and/or incomplete information. Sources of uncertainty are commonly measurements, probability methods (stochastic uncertainty), lack of credibility of information (information uncertainty), and phenomena imprecise descriptions in natural language (linguistic uncertainty). One of manners of handling uncertain data is to look at them as at outliers. An ‘‘outlier’’ or ‘‘exception’’ (also anomaly, deviation, abnormality, aberration, etc.) in a natural language means something unique, rare, infrequent, special, specific, sensational, or unexampled. These terms suggest that some features of objects, situations, or phenomena are unobvious or unusual to recipients considering/observing them. Outliers, if occur, are especially possible to be noticed as highlighted or differing on a background
Data Loading...