Background: Lung cancer remains one of the leading causes of cancer-related mortality worldwide, primarily due to late diagnosis. Although machine learning (ML) techniques have been widely applied for lung cancer classification, many studies lack a fully optimized end-to-end pipeline using routine clinical data. This study proposes an optimized ML framework that integrates demographic, lifestyle, and clinical features with systematic hyperparameter tuning to improve classification performance. Methods: A dataset of 309 patient records containing demographic, lifestyle, and clinical attributes was used. The data were preprocessed and split into training and testing sets in an 80:20 ratio. Feature selection was performed using metaheuristic algorithms, including Red Deer Optimization, Binary Grasshopper Optimization, Gray Wolf Optimization, and Bee Colony Optimization. Six ML classifiers—Logistic Regression, Support Vector Classifier, Gradient Boosting, Random Forest, K-Nearest Neighbors, and Gaussian Naive Bayes—were trained with optimized hyperparameters. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROC–AUC. Results: The optimized pipeline significantly improved classification performance. Logistic Regression achieved the highest accuracy of 91.07% with an AUC of 0.91, outperforming more complex ensemble models. Gradient Boosting and Random Forest both achieved an accuracy of 87.5%, while other classifiers demonstrated moderate performance. Conclusions: The proposed optimized ML pipeline enhances lung cancer classification accuracy using routine clinical data. The results highlight that simpler, well-optimized models can outperform complex approaches on structured datasets. This framework shows strong potential for early lung cancer risk screening and clinical decision support, although further validation on larger datasets is recommended.
Ansari et al. (Fri,) studied this question.