Score: 0

MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models

Published: March 11, 2025 | arXiv ID: 2503.08007v1

By: Han Zhao , Wenxuan Song , Donglin Wang and more

Potential Business Impact:

Robots learn to do many jobs by watching and listening.

Business Areas:

Robotics Hardware, Science and Engineering, Software

Developing versatile quadruped robots that can smoothly perform various actions and tasks in real-world environments remains a significant challenge. This paper introduces a novel vision-language-action (VLA) model, mixture of robotic experts (MoRE), for quadruped robots that aim to introduce reinforcement learning (RL) for fine-tuning large-scale VLA models with a large amount of mixed-quality data. MoRE integrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model (MLLM), forming a sparse-activated mixture-of-experts model. This design enables the model to effectively adapt to a wide array of downstream tasks. Moreover, we employ a reinforcement learning-based training objective to train our model as a Q-function after deeply exploring the structural properties of our tasks. Effective learning from automatically collected mixed-quality data enhances data efficiency and model performance. Extensive experiments demonstrate that MoRE outperforms all baselines across six different skills and exhibits superior generalization capabilities in out-of-distribution scenarios. We further validate our method in real-world scenarios, confirming the practicality of our approach and laying a solid foundation for future research on multi-task learning in quadruped robots.

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation

Robotics

Makes robots smarter and faster using less power.

26 Mar 2025 0

89%

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

Robotics

Teaches robots to follow spoken commands precisely.

22 Nov 2025 0

89%

HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies

Robotics

Robots learn from many different robot videos.

5 Dec 2025 1

View PDF Login to Bookmark

Page Count

7 pages

MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models

Robots learn to do many jobs by watching and listening.

Technical Abstract

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation

MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies