What question did this study set out to answer?

The research aims to evaluate the effectiveness of gradient boosting algorithms in detecting tax fraud compared to traditional methods.

April 27, 2026Open Access

Boosting-Based Machine Learning for Efficient Income Tax Fraud Detection

Key Points

The research aims to evaluate the effectiveness of gradient boosting algorithms in detecting tax fraud compared to traditional methods.
Utilized three gradient boosting algorithms: AdaBoost, Gradient Boosting, and XGBoost.
Created a structured synthetic dataset of 1,000 taxpayer profiles with twelve financial and behavioral attributes.
Trained and benchmarked the boosting models against five traditional methods under identical conditions.
XGBoost achieved a high R2 of 0.9850, ranking second overall among models tested.
Gradient Boosting and AdaBoost both scored an R2 of 0.9850 and 0.8560 respectively.
The findings imply that ensemble models outperform linear or proximity-based methods for tax-risk detection.

Abstract

Deliberate misreporting of tax revenues is an old problem with fiscal authorities across the globe.The traditional countermeasures such as rule engines using thresholds and periodic reviews by humans cannot keep pace with the evasion techniques becoming more diverse and the volume of data increasing.The paper presents a proposal and benchmarks a machine-learning detection system based on three gradientboosting algorithms: AdaBoost, Gradient Boosting and XGBoost.A structured synthetic dataset of 1,000 taxpayer profiles with twelve financial and behavioral attributes was experimented with; all boosting models were trained and compared to five traditional baselines under identical conditions.The empirical findings after a complete run of a notebook indicate that XGBoost achieves a very high R2 of 0.9850, and ranks second in the overall ranking, and significantly ahead of all non-boosting models but the Random Forest.Gradient Boosting got the same R 2 as AdaBoost 0.9850 and 0.8560 respectively.These results support the argument that, iteratively constructed ensemble models are significantly more suitable than linear or proximity-based methods with ordinally-encoded tax-risk targets.

Bookmark

View Full Paper

Cite This Study

Shariq et al. (Thu,) studied this question.

synapsesocial.com/papers/69eefcaefede9185760d38c5 https://doi.org/https://doi.org/10.56975/ijrti.v11i4.211654

Bookmark

View Full Paper