psda: A tool for extracting knowledge from symbolic data with an application in Brazilian educational data

PDF / 2,884,627 Bytes
17 Pages / 595.276 x 790.866 pts Page_size
90 Downloads / 314 Views

METHODOLOGIES AND APPLICATION

psda: A tool for extracting knowledge from symbolic data with an application in Brazilian educational data Wagner J. F. Silva1 · Renata M. C. R. Souza1

· F. J. A. Cysneiros1

© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Symbolic polygonal data analysis is a new type of framework to extract valuable knowledge from a new structure of data using regular polygon built from data in class, big data, and complex data. This paper introduces a toolbox for symbolic polygonal data, named psda, that contains the main descriptive measures for this type of variable, e.g., mean, variance, correlation, and a polygonal linear regression model (plr). It is applied at the Brazilian Basic Education Assessment System (SAEB), giving a new perspective to the managers of the counties to realize the public policy in the Brazilian educational system. The hypothesis test showed that the polygonal linear regression model presented the best performance compared to some symbolic interval regression models in the SAEB application. Keywords Polygonal data · psda · Symbolic data analysis · Regression · Descriptive measures · R

1 Introduction Data analysis is a fundamental framework for extraction of knowledge on biology, statistics, computing science, data mining and so on. This statistical approach is composed of many techniques, e.g., mean, variance, correlation, graphics and others developed over the years. For centuries the object of study of data analysis has been a p-dimensional point in R p ; this framework is known as classical data analysis. From technological advances, the structure of data has been improved every day. Unfortunately, classical data is limited to the study in p-dimensional point. Billard and Diday (2003, 2007) introduces a new type of data considering complex and diverse structures of data, e.g., histogram, probability distributions, intervals, list of categories, etc. This type of data is called symbolic data, and its study is known as symbolic data analysis (SDA). First step in SDA is to build Communicated by V. Loia.

B

Renata M. C. R. Souza [email protected] Wagner J. F. Silva [email protected] F. J. A. Cysneiros [email protected]

1

the symbolic dataset, where the rows are subsets of individual entities having a common property, called classes. In order to take the variability of the individuals inside each class, these new units are described by variables that can take symbolic values. According to Diday (2016) these classes are considered as new units of a higher level of generalization than individuals and they allow to reduce the initial huge size of an input data set by summarizing it. Moreover, classes can represent real units that interest the data analyst. In this context, classes can be seen as a new data framework before extracting knowledge by data science methods and tools. The second step in SDA is to extend machine learning and statistical techniques to symbolic data. The class framework provides some advantages (Diday 2016): – When the popu

Data Loading...

psda: A tool for extracting knowledge from symbolic data with an application in Brazilian educational data

Recommend Documents

Extracting medication information from unstructured public health data: a demonstration on data from population-based an

Extracting Backbone Structure of a Road Network from Raw Data

Visual Analytics for Extracting Trends from Spatio-temporal Data

ERMIS: Extracting Knowledge from Unstructured Big Data for Supporting Business Decision Making

Big Data in Healthcare Extracting Knowledge from Point-of-Care Machi

Data Mining and Social Network Analysis in the Educational Field: An Application for Non-Expert Users

Sensor Data Interpretation for Symbolic Analysis

Extracting Topics from Open Educational Resources

Extracting Knowledge From Time Series An Introduction to Nonlinear E

Extracting Maritime Traffic Networks from AIS Data Using Evolutionary Algorithm

Knowledge Discovery from Complex High Dimensional Data

Data envelopment analysis with missing data: an application to University libraries in Taiwan