Data Pre-processing of Student e-Learning Logs

Educational data mining faces the challenge of systematic knowledge discovery in large data streams to support educational decision-making. While much research effort has been made in examining study patterns with various data mining algorithms, how to pr

  • PDF / 164,029 Bytes
  • 6 Pages / 439.37 x 666.14 pts Page_size
  • 51 Downloads / 161 Views

DOWNLOAD

REPORT


Abstract Educational data mining faces the challenge of systematic knowledge discovery in large data streams to support educational decision-making. While much research effort has been made in examining study patterns with various data mining algorithms, how to prepare the data suitable and effective for these mining algorithms (i.e., the phase of data pre-processing) has not been investigated in detail. This paper presents a specific data pre-processing case for educators who are keen on investigating student online reading behavior using sequential pattern analyses. The implications of data pre-processing were also discussed. Keywords Log data · Data mining · Data pre-processing

1

Introduction

Educational data mining exploits data mining algorithms in different types of educational data with the aim to resolve educational issues [1]. In a meta-analysis by [2], the most popular techniques for data mining is clustering, followed by classification, sequential pattern, prediction, and association rule analysis. Each type of technique serves a different purpose, catering for different research needs. With any data mining technique, the very first step in data mining is always the transformation of data into an appropriate form for the mining process [3]. This data pre-processing allows raw data to be transformed into a shape suitable for resolving a problem using a specific data mining algorithm [4]. Nowadays, many data mining programs involve data pre-processing tools, such as WEKA [5] or MySQL Administrator tools [6]. [7], [8], and [9] have identified a series of tasks to be addressed in data pre-processing:  

Data cleaning: Clear noise and irrelevant data; User identification. Identify and aggregate all the data associated with a particular user together;

M. Zhou() Faculty of Education, University of Macau, Macau, China e-mail: [email protected] © Springer Science+Business Media Singapore 2016 K.J. Kim and N. Joukov (eds.), Information Science and Applications (ICISA) 2016, Lecture Notes in Electrical Engineering 376, DOI: 10.1007/978-981-10-0557-2_96

1007

1008

    

M. Zhou

Session identification. Identify the onset and offset of a session, and combine all the data across users associated with a particular session together, or combine the data for a given user across multiple sessions; Event identification. Break down sessions into smaller units, referred to as events; Data filtering: Use specific criteria (e.g., one or more attributes) to filter out the data; Data integration: Integrate and synchronize data from multiple data sources; Data transformation: Convert data into forms that can be processed by selected data mining algorithms.

[10] called for active participation by the educators in the data pre-processing stage, such that pre-processing facilities will be enhanced to prepare the e-learning data in a meaningful and useful manner. In this paper, I presented a specific data pre-processing case for educators who are keen on investigating student online reading behavior using sequential patte