House price prediction is crucial in real estate for informed decision-making. This paper presents an automated prediction system that combines genetic algorithms (GA) for feature optimization and Analysis of Variance (ANOVA) for statistical analysis. We apply and compare five ensemble machine learning (ML) models, namely Extreme Gradient Boosting Regression (XGBR), random forest regression (RFR), Categorical Boosting Regression (CBR), Adaptive Boosting Regression (ADBR), and Gradient Boosted Decision Trees Regression (GBDTR), on a comprehensive dataset. We used a dataset with 1000 samples with eight features and a secondary dataset for external validation with 3865 samples. Our integrated approach identifies Categorical Boosting with GA (CBRGA) as the best performer, achieving an R2 of 0.9973 and outperforming existing state-of-the-art methods. ANOVA-based analysis further enhances model interpretability and performance by isolating key factors such as square footage and lot size. To ensure robustness and transparency, we conduct 10-fold cross-validation and employ explainable AI techniques such as Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), providing insights into model decision-making and feature importance.
Hussain et al. (Fri,) studied this question.