Poverty is a crucial development challenge in Indonesia, including in regencies/cities in Kalimantan that require more attention. In reality, poverty is influenced by various factors. Therefore, this research proposes an analysis comparing the accuracy of basic and statistical machine learning models in predicting poverty rates and finding factors that affect poverty rates. The advance of this research is the performance comparison combined with the handling of missing data. The three models proposed in this study are binary logistic regression with backward stepwise selection, random forest, and extremely randomized trees (extra trees). The data used in this study is secondary data taken from the Indonesian Statistics (BPS) of five provinces in Kalimantan, where the pre-processing is done by handling missing data with a k-nearest neighbor (KNN). The results of the poverty prediction analysis show that the binary logistic regression model is the most accurate compared to random forest and extra trees, with a balanced accuracy of 75%. In addition, based on the best model with the highest accuracy, this study also found significant predictor variables that affect the poverty rate of regencies/cities in Kalimantan: population density, average years of schooling, and per capita expenditure on food.
Khikmah et al. (Mon,) studied this question.