This paper presents a machine learning-based approach to predict startup survival using structured venture capital data derived from Crunchbase. The dataset consists of over 17,000 startups and is transformed into a balanced classification problem through preprocessing and feature engineering techniques. Multiple models, including Logistic Regression, Random Forest, and Gradient Boosting, are evaluated using standard performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Experimental results demonstrate that ensemble methods outperform linear models, achieving a maximum ROC-AUC of 0.74. Feature importance analysis highlights that company age and funding-related variables are the most significant predictors of startup success. The study also emphasizes the inherent uncertainty in startup outcomes and the limitations of structured data. The entire machine learning pipeline is designed to be reproducible, ensuring that the methodology can be extended and validated in future research.
DUBEY et al. (Thu,) studied this question.