Key points are not available for this paper at this time.
A common goal in statistics and machine learning is to learn models that can well against distributional shifts, such as latent heterogeneous, unknown covariate shifts, or unmodeled temporal effects. We and analyze a distributionally robust stochastic optimization (DRO) that learns a model providing good performance against perturbations the data-generating distribution. We give a convex formulation for the, providing several convergence guarantees. We prove finite-sample upper and lower bounds, showing that distributional robustness comes at a cost in convergence rates. We give limit theorems for the parameters, where we fully specify the limiting distribution so that intervals can be computed. On real tasks including generalizing to subpopulations, fine-grained recognition, and providing good tail, the distributionally robust approach often exhibits improved.
Duchi et al. (Fri,) studied this question.