May 27, 2026Open Access

An automatic finite-sample robustness metric: when can dropping a little data change conclusions? Part I: definitions and experiments

Key Points

Key points are not available for this paper at this time.

Abstract

Study samples often differ in non-random ways from the target populations to which policy decisions will eventually be applied. Researchers typically hope that such departures from random sampling-due to changes in the population over time and space, or difficulties in sampling truly randomly-are small, and their corresponding impact on the inference should be small as well. Accordingly, researchers might be concerned if the conclusions of their studies are excessively sensitive to a very small proportion of our sample data. We propose a method to assess the sensitivity of applied conclusions to the removal of a small fraction of the sample. Manually checking the influence of all possible small subsets is computationally infeasible, so we use an approximation to find the most influential subset. Our metric, the 'Approximate Maximum Influence Perturbation', is based on the classical influence function. It is automatically computable for common methods including (but not limited to) ordinary least squares, instrumental variables regression, maximum likelihood, generalized method of moments and variational Bayes. At minimal extra cost, we provide an exact finite-sample lower bound on sensitivity. While some empirical applications are robust, we show that results of several influential economics papers can be overturned by removing less than 1% of the sample. This article is part of the theme issue 'Statistical workflow'.

An automatic finite-sample robustness metric: when can dropping a little data change conclusions? Part I: definitions and experiments

Key Points

Abstract

Cite This Study