A Survey on Pre-Processing Educational Data
Data pre-processing is the first step in any data mining process, being one of the most important but less studied tasks in educational data mining research. Pre-processing allows transforming the available raw educational data into a suitable format read
- PDF / 930,064 Bytes
- 36 Pages / 439.37 x 666.142 pts Page_size
- 78 Downloads / 207 Views
A Survey on Pre-Processing Educational Data Cristóbal Romero, José Raúl Romero and Sebastián Ventura
Abstract Data pre-processing is the first step in any data mining process, being one of the most important but less studied tasks in educational data mining research. Pre-processing allows transforming the available raw educational data into a suitable format ready to be used by a data mining algorithm for solving a specific educational problem. However, most of the authors rarely describe this important step or only provide a few works focused on the pre-processing of data. In order to solve the lack of specific references about this topic, this paper specifically surveys the task of preparing educational data. Firstly, it describes different types of educational environments and the data they provide. Then, it shows the main tasks and issues in the pre-processing of educational data, Moodle data being mainly used in the examples. Next, it describes some general and specific pre-processing tools and finally, some conclusions and future research lines are outlined. Keywords Educational data mining process aration Data transformation
Data pre-processing Data prep-
Abbreviations AIHS ARFF CBE CSV
Adaptive and intelligent hypermedia system Attribute-relation File Format Computer-based education Comma-separated values
C. Romero (&) J. R. Romero S. Ventura Department of Computer Science and Numerical Analysis, University of Córdoba Campus de Rabanales, Edificio C2-Albert Einstein, Córdoba, Spain e-mail: [email protected] J. R. Romero e-mail: [email protected] S. Ventura e-mail: [email protected]
A. Peña-Ayala (ed.), Educational Data Mining, Studies in Computational Intelligence 524, DOI: 10.1007/978-3-319-02738-8_2, Springer International Publishing Switzerland 2014
29
30
DM EDM HTML ID IP ITS KDD LMS MCQ MIS MOOC OLAP SQL WUM WWW XML
C. Romero et al.
Data mining Educational data mining Hypertext Markup language Identifier Internet Protocol Intelligent tutoring system Knowledge discovery in databases Learning management system Multiple choice question Management information system Massive Open Online Course Online Analytical Processing Structured Query Language Web Usage Mining World Wide Web Extensible Markup Language
2.1 Introduction Educational Data Mining (EDM) is a field that exploits Data Mining (DM) algorithms in different types of educational data in order to resolve educational research issues [1]. Data mining or Knowledge Discovery in Data-bases (KDD) is the automatic extraction of implicit and interesting patterns from large data collections [2]. The first step in the KDD process is the transformation of data into an appropriate form for the mining process, which is usually called data preprocessing in data mining systems [3]. It allows raw data to be transformed into a shape suitable for resolving a problem using a specific mining method, technique or algorithm [4]. In fact, the better raw data are pre-processed, the more useful information is possible to discover. However, the data pre-pro
Data Loading...