Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems
By: Xi Shi, Mengxin Zheng, Qian Lou
Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS
Similar Papers
Latent Collaboration in Multi-Agent Systems
Computation and Language
AI models work together better in their minds.
Orchestrator: Active Inference for Multi-Agent Systems in Long-Horizon Tasks
Multiagent Systems
Helps robots work together better on hard jobs.
BAMAS: Structuring Budget-Aware Multi-Agent Systems
Multiagent Systems
Saves money on smart computer teams.