Score: 0

FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation

Published: August 4, 2025 | arXiv ID: 2508.02190v1

By: Cui Miao , Tao Chang , Meihan Wu and more

Potential Business Impact:

Trains robots privately without sharing data

Vision-language-action (VLA) models have significantly advanced robotic manipulation by enabling robots to interpret language instructions for task execution. However, training these models often relies on large-scale user-specific data, raising concerns about privacy and security, which in turn limits their broader adoption. To address this, we propose FedVLA, the first federated VLA learning framework, enabling distributed model training that preserves data privacy without compromising performance. Our framework integrates task-aware representation learning, adaptive expert selection, and expert-driven federated aggregation, enabling efficient and privacy-preserving training of VLA models. Specifically, we introduce an Instruction Oriented Scene-Parsing mechanism, which decomposes and enhances object-level features based on task instructions, improving contextual understanding. To effectively learn diverse task patterns, we design a Dual Gating Mixture-of-Experts (DGMoE) mechanism, where not only input tokens but also self-aware experts adaptively decide their activation. Finally, we propose an Expert-Driven Aggregation strategy at the federated server, where model aggregation is guided by activated experts, ensuring effective cross-client knowledge transfer.Extensive simulations and real-world robotic experiments demonstrate the effectiveness of our proposals. Notably, DGMoE significantly improves computational efficiency compared to its vanilla counterpart, while FedVLA achieves task success rates comparable to centralized training, effectively preserving data privacy.

Graph-Fused Vision-Language-Action for Policy Reasoning in Multi-Arm Robotic Manipulation

Robotics

Robots learn to build things by watching videos.

9 Sep 2025 1

91%

Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control

Robotics

Robots learn to build things by watching videos.

7 Aug 2025 0

91%

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action

CV and Pattern Recognition

Teaches robots to act and think better.

27 Nov 2025 0

View PDF Login to Bookmark

Page Count

10 pages

FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation

Trains robots privately without sharing data

Technical Abstract

Graph-Fused Vision-Language-Action for Policy Reasoning in Multi-Arm Robotic Manipulation

Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action