August 25, 2025Open Access

Optimal Multi-Distribution Learning

Key Points

The algorithm provides an ε-optimal randomized hypothesis with sample complexity in the order of (d+k)/ε².
Empirical risk minimization oracle is utilized to ensure oracle efficiency in accessing the hypothesis class.
Improper learning reveals significant sample size barriers when relying only on deterministic hypotheses.
Findings address open problems from COLT 2023, bridging gaps in multi-distribution learning methodologies.

Abstract

Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across k distinct data distributions, has emerged as a unified framework in response to the evolving demand for robustness, fairness, multi-group collaboration, etc. Achieving data-efficient MDL necessitates adaptive sampling, also called on-demand sampling, throughout the learning process. However, there exist substantial gaps between the state-of-the-art upper and lower bounds on the optimal sample complexity. Focusing on a hypothesis class of Vapnik–Chervonenkis (VC) dimension d, we propose a novel algorithm that yields an ε-optimal randomized hypothesis with a sample complexity on the order of \ (d+k ² \) (modulo some logarithmic factor), matching the best-known lower bound. Our algorithmic ideas and theory are further extended to accommodate Rademacher classes. The proposed algorithms are oracle-efficient, which access the hypothesis class solely through an empirical risk minimization oracle. Additionally, we establish the necessity of improper learning, revealing a large sample size barrier when only deterministic, proper hypotheses are permitted. These findings resolve three open problems presented in COLT 2023 (i. e. , Awasthi et al. 4, Problems 1, 3 and 4).

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhang et al. (Mon,) studied this question.

synapsesocial.com/papers/68af5f19ad7bf08b1eae21ae https://doi.org/https://doi.org/10.1145/3760256

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper