What type of study is this?

September 10, 2025

Developing a predictive model for student academic performance using machine learning techniques

Key Points

XGBoost outperformed traditional models, achieving R² of .70, demonstrating superior prediction in diverse student populations.
Feature importance analysis indicated demographic factors like Students with Disabilities significantly impacted predictions.
Ensemble methods like Random Forest and XGBoost effectively captured complex interactions within educational data structures.
Recommendations emphasize fairness in model deployment and the necessity for equitable educational interventions.

Abstract

This study investigates the predictive capability of machine learning techniques in forecasting student academic performance using school-level and demographic data. A structured, publicly available dataset from the District of Columbia Public Schools was employed, comprising 1,163 records representing various student groups and institutional contexts. After preprocessing and feature selection, three regression models were developed and evaluated: a baseline Linear Regression model, Random Forest Regressor, and XGBoost Regressor. The baseline model demonstrated limited predictive strength (R² = .32, MAE = 13.79), while ensemble models significantly outperformed it. Random Forest achieved an R² of .69 and MAE of 7.74, capturing complex interactions more effectively. XGBoost slightly outperformed Random Forest with an R² of .70 and MAE of 7.19, showing stronger generalization and sensitivity to underrepresented groups. Feature importance analysis revealed that institutional factors such as Framework Points Earned strongly influenced predictions in Random Forest, whereas XGBoost emphasized subgroup characteristics, including Students with Disabilities, English Learners, and At-Risk populations. These findings highlight the strengths of ensemble methods in modeling non-linear and multidimensional educational data while raising questions about the trade-offs between model accuracy and equity. The study concludes that predictive models should be evaluated not only by statistical performance but also by their capacity to inform equitable interventions in education. Recommendations include the ethical deployment of predictive systems, incorporation of contextual data, and prioritization of fairness in model selection to support inclusive, data-informed educational policy and practice.

Mark Helpful

Bookmark

Relay