Empirical Risk Minimization (ERM) lies at the foundation of modern Machine Learning(ML), supported by advances in optimization theory that yield efficient algorithms with provable convergence guarantees. However, increasing constraints on privacy, memory, computation, and communication have motivated the use of decentralized networks of devices for data collection and processing. Real-world decentralized systems, however, are often subject to node or link failures due to hardware faults or cyberattacks. In the absence of a central coordinator, designing decentralized ML algorithms that remain robust to such failures while minimizing communication and computation overhead has become a key challenge. This dissertation addresses this challenge through the design and analysis of three algorithms—BRIDGE, CUBED-GD, and RESIST—that jointly advance the theoretical and practical understanding of robust and efficient decentralized learning. BRIDGE (Byzantine-ResIlient Decentralized Gradient dEscent) establishes robustness guarantees for vector-valued models and characterizes their statistical learning rates. CUBED-GD (CommUnication-efficient Byzantine-rEsilient Decentralized Gradient Descent) achieves robustness while substantially reducing the number of communication rounds needed to reach a target accuracy, improving efficiency in bandwidth-limited settings. RESIST (Resilient dEcentralized learning using conSensus gradIent deScenT) attains linear convergence for strongly convex objectives via a multi-step consensus mechanism, enhancing computational efficiency. Each algorithm is rigorously analyzed for both algorithmic and statistical convergence (sample complexity) under convex and nonconvex loss functions and empirically validated on real-world datasets, including MNIST and CIFAR-10. Collectively, these contributions provide a unified framework for decentralized learning that balances resilience, communication and computation efficiency, and scalability—bridging theoretical robustness with practical performance for deployment in adversarial and resource-constrained environments.
Cheng Fang (Thu,) studied this question.