What question did this study set out to answer?

The aim is to enhance the ranking of start-ups for venture capital investment by addressing leakage and class imbalance issues.

March 25, 2026Open Access

Leakage-Aware Time-Based Top-K Start-Up Ranking for Venture Capital Investment Success Under Severe Class Imbalance Conditions: A Screening Evaluation Framework

Key Points

The aim is to enhance the ranking of start-ups for venture capital investment by addressing leakage and class imbalance issues.
Developed a time-based evaluation framework for ranking based on a dataset of 117,141 early-stage firms.
Enforced a 180-day embargo around the train-test boundary to prevent leakage.
Evaluated ranking quality using various metrics like PR-AUC and NDCG@K with bootstrap confidence intervals.
Maturity-related signals achieved a PR-AUC of 0.0144 under the leakage-aware protocol.
Team signals yielded the best concentration for top-50 shortlist rankings.
Probability calibration reduced the Brier score from 0.0972 to 0.0161, improving reliability.

Abstract

Many real-world screening tasks in venture capital must rank large start-up candidate pools under conditions of tight review capacity, time-varying information, and rare investment success outcomes. When datasets are constructed retrospectively, post-decision updates can leak into features and inflate performance, especially with random splits. This study proposes a leakage-aware, time-based evaluation framework for capacity-constrained screening formulated as a top-K ranking problem. Using a dataset of 117,141 early-stage firms as an empirical testbed, features were constructed strictly as of a reference time t0, a 180-day temporal embargo was enforced around the train–test boundary, and generalization was assessed with time-ordered splits. Because venture capital decisions are made on a shortlist, evaluation emphasizes ranking quality using PR-AUC, Lift@K, Precision@K/Recall@K, and NDCG@K, reported with bootstrap confidence intervals. Under this leakage-aware protocol and with strong class imbalance, maturity-related signals achieve the strongest PR-AUC (0.0144), while team and combined signals yield the best top-50 shortlist concentration. Finally, probability calibration substantially improves reliability for threshold planning (Brier score reduced from 0.0972 to 0.0161 with sigmoid calibration) while leaving ranking essentially unchanged. Overall, the study provides a leakage-aware evaluation template and an interpretable baseline for time-dependent venture capital screening tasks involving start-up selection, investment success prediction, leakage risk, and limited review capacity.

Leakage-Aware Time-Based Top-K Start-Up Ranking for Venture Capital Investment Success Under Severe Class Imbalance Conditions: A Screening Evaluation Framework

Key Points

Abstract

Cite This Study