We study the generalization properties of stochastic optimization methods under adaptive data sampling schemes, focusing on the setting of pairwise learning, which is central to tasks like ranking, metric learning, and AUC maximization. Unlike pointwise learning, pairwise methods must address statistical dependencies between input pairs—a challenge that existing analyses do not adequately handle when sampling is adaptive. In this work, we extend a general framework that integrates two algorithm-dependent approaches—algorithmic stability and PAC–Bayes analysis for this purpose. Specifically, we examine (1) Pairwise Stochastic Gradient Descent (Pairwise SGD), widely used across machine learning applications, and (2) Pairwise Stochastic Gradient Descent Ascent (Pairwise SGDA), common in adversarial training. Our analysis avoids artificial randomization and leverages the inherent stochasticity of gradient updates instead. Our results yield generalization guarantees of order n−1/2 under non-uniform adaptive sampling strategies, covering both smooth and non-smooth convex settings. We believe these findings address a significant gap in the theory of pairwise learning with adaptive sampling.
Zhou et al. (Fri,) studied this question.