ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation
By: Jiawen Yu , Hairuo Liu , Qiaojun Yu and more
Potential Business Impact:
Robots feel and move better with touch.
Vision-Language-Action (VLA) models have advanced general-purpose robotic manipulation by leveraging pretrained visual and linguistic representations. However, they struggle with contact-rich tasks that require fine-grained control involving force, especially under visual occlusion or dynamic uncertainty. To address these limitations, we propose ForceVLA, a novel end-to-end manipulation framework that treats external force sensing as a first-class modality within VLA systems. ForceVLA introduces FVLMoE, a force-aware Mixture-of-Experts fusion module that dynamically integrates pretrained visual-language embeddings with real-time 6-axis force feedback during action decoding. This enables context-aware routing across modality-specific experts, enhancing the robot's ability to adapt to subtle contact dynamics. We also introduce \textbf{ForceVLA-Data}, a new dataset comprising synchronized vision, proprioception, and force-torque signals across five contact-rich manipulation tasks. ForceVLA improves average task success by 23.2% over strong pi_0-based baselines, achieving up to 80% success in tasks such as plug insertion. Our approach highlights the importance of multimodal integration for dexterous manipulation and sets a new benchmark for physically intelligent robotic control. Code and data will be released at https://sites.google.com/view/forcevla2025.
Similar Papers
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
Robotics
Robots learn new tasks by watching and moving.
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation
Robotics
Robots learn to feel and predict touch for better tasks.
EdgeVLA: Efficient Vision-Language-Action Models
Robotics
Makes robots understand and move faster.