Score: 0

ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

Published: May 28, 2025 | arXiv ID: 2505.22159v3

By: Jiawen Yu , Hairuo Liu , Qiaojun Yu and more

Potential Business Impact:

Robots feel and move better with touch.

Business Areas:
Autonomous Vehicles Transportation

Vision-Language-Action (VLA) models have advanced general-purpose robotic manipulation by leveraging pretrained visual and linguistic representations. However, they struggle with contact-rich tasks that require fine-grained control involving force, especially under visual occlusion or dynamic uncertainty. To address these limitations, we propose ForceVLA, a novel end-to-end manipulation framework that treats external force sensing as a first-class modality within VLA systems. ForceVLA introduces FVLMoE, a force-aware Mixture-of-Experts fusion module that dynamically integrates pretrained visual-language embeddings with real-time 6-axis force feedback during action decoding. This enables context-aware routing across modality-specific experts, enhancing the robot's ability to adapt to subtle contact dynamics. We also introduce \textbf{ForceVLA-Data}, a new dataset comprising synchronized vision, proprioception, and force-torque signals across five contact-rich manipulation tasks. ForceVLA improves average task success by 23.2% over strong pi_0-based baselines, achieving up to 80% success in tasks such as plug insertion. Our approach highlights the importance of multimodal integration for dexterous manipulation and sets a new benchmark for physically intelligent robotic control. Code and data will be released at https://sites.google.com/view/forcevla2025.

Country of Origin
πŸ‡¨πŸ‡³ China

Page Count
20 pages

Category
Computer Science:
Robotics