Predicting Student Performance from Combined Data Sources

This chapter will explore the use of predictive modeling methods for identifying students who will benefit most from tutor interventions. This is a growing area of research and is especially useful in distance learning where tutors and students do not mee

  • PDF / 798,102 Bytes
  • 28 Pages / 439.37 x 666.142 pts Page_size
  • 21 Downloads / 224 Views

DOWNLOAD

REPORT


Predicting Student Performance from Combined Data Sources Annika Wolff, Zdenek Zdrahal, Drahomira Herrmannova and Petr Knoth

Abstract This chapter will explore the use of predictive modeling methods for identifying students who will benefit most from tutor interventions. This is a growing area of research and is especially useful in distance learning where tutors and students do not meet face to face. The methods discussed will include decision-tree classification, support vector machine (SVM), general unary hypotheses automaton (GUHA), Bayesian networks, and linear and logistic regression. These methods have been trialed through building and testing predictive models using data from several Open University (OU) modules. The Open University offers a good test-bed for this work, as it is one of the largest distance learning institutions in Europe. The chapter will discuss how the predictive capacity of the different sources of data changes as the course progresses. It will also highlight the importance of understanding how a student’s pattern of behavior changes during the course. Keywords Predictive modeling Student outcome

 Education 

Virtual learning environment



Abbreviations ANOVA CMS

Analysis of variance Course management system

A. Wolff (&)  Z. Zdrahal  D. Herrmannova  P. Knoth Knowledge Media Institute, The Open University, Milton Keynes, MK7 6AA, UK e-mail: [email protected] Z. Zdrahal e-mail: [email protected] D. Herrmannova e-mail: [email protected] P. Knoth e-mail: [email protected]

A. Peña-Ayala (ed.), Educational Data Mining, Studies in Computational Intelligence 524, DOI: 10.1007/978-3-319-02738-8_7,  Springer International Publishing Switzerland 2014

175

176

CS GUHA MOOC OU SVM TMA VLE

A. Wolff et al.

Course signals General unary hypotheses automaton Massive open online course Open university Support vector machine Tutor marked assessment Virtual learning environment

7.1 Introduction Predicting student performance, in time to make interventions for improving student performance and reducing drop out or failure, leads to benefits for both students and teaching institutions. In traditional classroom learning, tutors use a range of information sources to judge whom to help, including their personal interactions with the students. In distance education, where students interact with learning materials on a virtual learning environment (VLE), machine-learning methods can be applied to combined sources of student data to predict which students will benefit most from an intervention and allow tutors to better judge whom to offer their assistance to. Whilst VLE’s have been used to deliver course materials for quite some time, their use for really large-scale delivery is a recent phenomenon. Previous course statistics have focused largely on providing data for a whole course, after completion, using only demographic data and historical analysis. For example, Kabra and Bichkar [1] use decision trees to predict failing engineering students, using past performance as th