The insurance industry confronts two analytically critical and financially consequential challenges: accurate prediction of claim settlement amounts and timely detection of fraudulent claims. Conventional approachesrule-based heuristics, logistic regression scorecards, and manual adjuster assessmentsare demonstrably inadequate for capturing the nonlinear, high-dimensional interactions that characterise modern insurance claim data. This paper presents ClaimSmart AI, a comprehensive, modular, end-to-end machine learning pipeline that addresses both challenges within a unified analytical framework. The system operates on a synthetically generated dataset of 15,000 insurance claim records encompassing 19 attributes spanning policyholder demographics, policy characteristics, vehicle parameters, claim specifics, and behavioural indicators. A dual-model architecture employs a Random Forest Regressor (150 estimators) for claim amount prediction and a Random Forest Classifier (150 estimators, balanced class weights) for binary fraud risk detection, both trained on a stratified 80/20 holdout split with StandardScaler feature normalisation and LabelEncoder categorical transformation. The regression model achieves a Mean Absolute Error below INR 15,000 and an R-squared coefficient of determination exceeding 0.70, while the classification model delivers accuracy above 0.80, fraud-class recall exceeding 0.74, and F1-Score above 0.76, surpassing logistic regression and rule-based baselines on equivalent evaluation protocols. Prediction outputs are enriched with four derived business metricspredicted claim amount, claim variance, fraud risk probability, and a three-tier fraud risk categoryand persisted to a MySQL relational database for direct consumption by Power BI and enterprise analytics platforms. Eight publication-quality visualisation charts provide comprehensive analytical coverage from fraud distribution and regional heatmaps to actual-versus-predicted scatter analysis. A mysqldump-format SQL export module ensures enterprise portability and regulatory archival compliance. The complete pipeline executes through a single orchestration script, establishing ClaimSmart AI as both a rigorous academic contribution and a practical template for production insurance analytics deployment.
Building similarity graph...
Analyzing shared references across papers
Loading...
Add This Paper to Your Research Feed
Any time a new paper drops it will be there.
Basha et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: