April 24, 2024

A Novel Machine Learning Approach For handling Imbalanced Data: Leveraging SMOTE-ENN and XGBoost

Key Points

Key points are not available for this paper at this time.

Abstract

The healthcare fraud detection industry is in a state of continuous growth, yet it faces notable obstacles, especially in addressing data imbalances. Traditional machine learning (ML) techniques often inadequately tackle this issue, leading to overfitting caused by Random Oversampling (ROS) method, noise generation in Synthetic Minority Oversampling Technique (SMOTE), and data loss through Random Undersampling (RUS) strategy. Furthermore, choosing an optimal classifier is vital for enhancing fraud detection efficiency. This paper proposes a novel hybrid method combining SMOTE-ENN with the XGBoost classifier, specifically tailored for the Medicare PartB dataset. This method combines the Synthetic Minority Oversampling Technique (SMOTE) with Edited Nearest Neighbors (ENN) to enhance model accuracy by generating synthetic instances and removing noisy data, resulting in a more balanced dataset. Combining SMOTE-ENN's data balancing capabilities with XGBoost's predictive abilities significantly enhances model performance and accuracy, representing a notable improvement when handling imbalanced data issue. The proposed technique surpassed standard resampling strategies, as demonstrated by XGBoost's impressive outcomes: accuracy, precision, AUC at 0.95, recall at 0.94, and an F1-score of 0.96. These findings affirm the method's efficiency in boosting model performance and managing imbalanced data effectively.

Bookmark

Cite This Study

Bounab et al. (Wed,) studied this question.

synapsesocial.com/papers/68e6dd5db6db64358765930c https://doi.org/https://doi.org/10.1109/pais62114.2024.10541220

Bookmark