What question did this study set out to answer?

The study aims to enhance the detection of healthcare fraud by developing an integrated framework combining advanced machine learning techniques.

March 27, 2026Open Access

WCGAN-GA-RF: Healthcare Fraud Detection via Generative Adversarial Networks and Evolutionary Feature Selection

Puntos clave

The study aims to enhance the detection of healthcare fraud by developing an integrated framework combining advanced machine learning techniques.
Developed a WCGAN-GP for synthetic data generation.
Implemented a genetic algorithm-based feature selection to reduce data dimensionality.
Used Random Forest for classifying fraudulent and non-fraudulent claims.
Validated the framework using a dataset of 16,000 healthcare claims with class imbalance.
Achieved a precision of 96.47% and a recall of 97.05%.
Attained an F1-score of 96.26%.
Reduced features from 80 to 28 while maintaining high accuracy.
Outperformed conventional oversampling methods in fraud detection tasks.

Resumen

Healthcare fraud poses significant risks to insurance systems, undermining both financial sustainability and equitable access to care. Accurate detection of fraudulent claims is therefore critical to ensuring the integrity of healthcare insurance operations. However, the increasing sophistication of fraud techniques and limited data availability have undermined the performance of traditional detection approaches. To address these challenges, this paper proposes WCGAN-GA-RF, an integrated fraud detection framework that synergistically combines Wasserstein Conditional Generative Adversarial Network with gradient penalty (WCGAN-GP) for synthetic data generation, genetic algorithm-based feature selection (GA-RF) for dimensionality reduction, and Random Forest (RF) for classification. The proposed framework was empirically validated on a real-world dataset of 16,000 healthcare insurance claims from a Chinese healthcare technology firm, characterized by a 16:1 class imbalance ratio (5.9% fraudulent samples) and 118 original features. Using a stratified 80/20 train–test split with results averaged over five independent runs, the WCGAN-GA-RF framework achieved a precision of 96.47±0.5%, a recall of 97.05±0.4%, and an F1-score of 96.26±0.4%. Notably, the GA-RF component achieved a 65% feature reduction (from 80 to 28 features) while maintaining competitive detection accuracy. Comparative experiments demonstrate that the proposed approach outperforms conventional oversampling methods, including Random Oversampling (ROS), Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN), particularly in handling high-dimensional, severely imbalanced healthcare fraud data.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo