July 1, 2024Open Access

Evaluating Model Performance Under Worst-case Subpopulations

Key Points

Key points are not available for this paper at this time.

Abstract

The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Li et al. (Mon,) studied this question.

synapsesocial.com/papers/68e61caeb6db6435875af573 https://doi.org/https://doi.org/10.48550/arxiv.2407.01316

Bookmark

View Full Paper