Key points are not available for this paper at this time.
We study robust distributed learning that involves minimizing a non-convex function with saddle points. We consider the Byzantine setting where some machines have abnormal or even arbitrary and adversarial behavior. In setting, the Byzantine machines may create fake local minima near a saddle that is far away from any true local minimum, even when robust gradient are used. We develop ByzantinePGD, a robust first-order algorithm can provably escape saddle points and fake local minima, and converge to approximate true local minimizer with low iteration complexity. As a-product, we give a simpler algorithm and analysis for escaping saddle points the usual non-Byzantine setting. We further discuss three robust gradient that can be used in ByzantinePGD, including median, trimmed mean, iterative filtering. We characterize their performance in concrete settings, and argue for their near-optimality in low and high regimes.
Yin et al. (Thu,) studied this question.