A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach
- PDF / 432,576 Bytes
- 21 Pages / 439.37 x 666.142 pts Page_size
- 43 Downloads / 172 Views
A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach Ricardo Costa-Mendes 1 Frederico Cruz-Jesus 1
& Tiago
Oliveira 1
& Mauro
Castelli 1
&
Received: 29 February 2020 / Accepted: 27 August 2020/ # The Author(s) 2020
Abstract This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p-values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes. Keywords Machine learning . Stacking . Random forest . Support vector regression .
Academic achievement . High school grades
* Ricardo Costa-Mendes [email protected] Extended author information available on the last page of the article
Education and Information Technologies
1 Introduction A nation’s wealth, interconnected with the availability of human capital in its economy, hinges on its citizens’ academic achievement and generally on the education system attainment level (Becker 1964; Hanushek and Wößmann 2010; Strenze 2007). Knowing the determinants of academic success in detail is an essential cornerstone in the pursuit of appropriate public policy designs. The ability to predict and anticipate student academic grades would enable policymakers, principals, and teachers to take timely action on preventing unfavourable results and provide a readily available solid conceptual framework, capable of feeding sound decision support systems (van der Scheer and Visscher 2018). The development and management of a nationwide schooling and education system database which brings together relevant information on the determinants of academic achievement is an investment that requires attention to the complexity of data collection and management at the school-teacher-student trinomial level. Still, it is an indispensable step in promoting conceptually well-designed
Data Loading...