Back to NVRIC Innovations

Communication-Efficient Decentralized Multi-agent Machine Learning Method

Case ID:

UNR25-023

Background

In decentralized machine learning systems, agents often have varying computational and communication resources, leading to straggler problems and inefficient training. Existing federated learning methods typically rely on a central server, creating a bottleneck and limiting scalability in resource-constrained or failure-prone environments.

Technology Overview

Researchers at the University of Nevada, Reno have developed ComDML—a communication-efficient, serverless, decentralized multi-agent learning framework that balances workload among heterogeneous agents. ComDML allows slower agents to offload portions of their tasks to faster peers through local-loss-based split training, reducing idle time and improving resource utilization. A dynamic decentralized pairing scheduler optimizes these offloads using integer programming based on both computation and communication capacities.

The technology supports scalable parallel model updates, AllReduce-based aggregation, and integrates privacy-preserving mechanisms such as differential privacy, patch shuffling, and distance correlation. Experiments using ResNet-56 and ResNet-110 on CIFAR-10/100 and CINIC-10 datasets show up to 71% reduction in training time with accuracy comparable to state-of-the-art methods

Figure 1: Workload balancing

Further Details:

For more details, refer to the full publication.

S. M. Sajjadi Mohammadabadi, L. Yang, F. Yan and J. Zhang, "Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning," 2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS), Jersey City, NJ, USA, 2024, pp. 680-691, doi: 10.1109/ICDCS60910.2024.00069.

Benefits