Healthcare fraud poses significant risks to insurance systems, undermining both financial sustainability and equitable access to care. Accurate detection of fraudulent claims is therefore critical to ensuring the integrity of healthcare insurance operations. However, the increasing sophistication of fraud techniques and limited data availability have undermined the performance of traditional detection approaches. To address these challenges, this paper proposes WCGAN-GA-RF, an integrated fraud detection framework that synergistically combines Wasserstein Conditional Generative Adversarial Network with gradient penalty (WCGAN-GP) for synthetic data generation, genetic algorithm-based feature selection (GA-RF) for dimensionality reduction, and Random Forest (RF) for classification. The proposed framework was empirically validated on a real-world dataset of 16,000 healthcare insurance claims from a Chinese healthcare technology firm, characterized by a 16:1 class imbalance ratio (5.9% fraudulent samples) and 118 original features. Using a stratified 80/20 train–test split with results averaged over five independent runs, the WCGAN-GA-RF framework achieved a precision of 96.47±0.5%, a recall of 97.05±0.4%, and an F1-score of 96.26±0.4%. Notably, the GA-RF component achieved a 65% feature reduction (from 80 to 28 features) while maintaining competitive detection accuracy. Comparative experiments demonstrate that the proposed approach outperforms conventional oversampling methods, including Random Oversampling (ROS), Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN), particularly in handling high-dimensional, severely imbalanced healthcare fraud data.
Cai et al. (Tue,) studied this question.