What question did this study set out to answer?

The main goal is to develop a distributed learning approach that is both Byzantine-robust and communication-efficient, avoiding reliance on full gradient data.

February 6, 2026

Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering

Key Points

The main goal is to develop a distributed learning approach that is both Byzantine-robust and communication-efficient, avoiding reliance on full gradient data.
Propose a stochastic distributed learning method without batch size requirements
Leverage Polyak Momentum to address noise from biased compressors and stochastic gradients
Establish tight complexity bounds for nonconvex smooth loss functions
Conduct extensive experiments for validation on binary and image classification tasks
Convergence to a smaller neighborhood around the solution compared to existing methods
Demonstrated efficiency in handling Byzantine attacks with information compression
Benchmarking showed superior performance in both binary and image classification tasks

Abstract

Distributed learning is the standard for training large-scale models across private data silos, offering privacy and efficiency but facing challenges in Byzantine robustness and communication efficiency. Existing Byzantine-robust and communication-efficient methods rely on full gradient information, and they only converge to an unnecessarily large neighborhood around the solution. Motivated by these issues, we propose a novel Byzantine-robust and communication-efficient stochastic distributed learning method that imposes no requirements on batch size and converges to a smaller neighborhood, aligning with the theoretical lower bound. Our key innovation is leveraging Polyak Momentum to mitigate the noise caused by both biased compressors and stochastic gradients, thus defending against Byzantine workers under information compression. We provide proof of tight complexity bounds for nonconvex smooth loss functions. Finally, we validate the practical significance of our algorithm through an extensive series of experiments, benchmarking its performance on both binary classification and image classification tasks.

Bookmark

Cite This Study

Liu et al. (Thu,) studied this question.

synapsesocial.com/papers/698584f98f7c464f23008434 https://doi.org/https://doi.org/10.1109/tnnls.2026.3658104

Bookmark