Background
In decentralized machine learning systems, agents often have varying computational and communication resources, leading to straggler problems and inefficient training. Existing federated learning methods typically rely on a central server, creating a bottleneck and limiting scalability in resource-constrained or failure-prone environments.
Technology Overview
Researchers at the University of Nevada, Reno have developed ComDML—a communication-efficient, serverless, decentralized multi-agent learning framework that balances workload among heterogeneous agents. ComDML allows slower agents to offload portions of their tasks to faster peers through local-loss-based split training, reducing idle time and improving resource utilization. A dynamic decentralized pairing scheduler optimizes these offloads using integer programming based on both computation and communication capacities.
The technology supports scalable parallel model updates, AllReduce-based aggregation, and integrates privacy-preserving mechanisms such as differential privacy, patch shuffling, and distance correlation. Experiments using ResNet-56 and ResNet-110 on CIFAR-10/100 and CINIC-10 datasets show up to 71% reduction in training time with accuracy comparable to state-of-the-art methods
Figure 1: Workload balancing
Further Details:
For more details, refer to the full publication.
S. M. Sajjadi Mohammadabadi, L. Yang, F. Yan and J. Zhang, "Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning," 2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS), Jersey City, NJ, USA, 2024, pp. 680-691, doi: 10.1109/ICDCS60910.2024.00069.
Benefits
- Reduces training time by up to 71%
- No central server required—improves robustness and scalability
- Compatible with non-IID data and large deep models
- Maintains high model accuracy under privacy constraints
- Adaptable to heterogeneous device environments
Applications
- Distributed AI in mobile and IoT devices
- Edge computing in smart cities, autonomous systems, and sensor networks
- Collaborative learning in swarm robotics or vehicle networks
- Privacy-aware healthcare or financial data analytics