Score: 0

CubeRobot: Grounding Language in Rubik's Cube Manipulation via Vision-Language Model

Published: March 25, 2025 | arXiv ID: 2503.19281v1

By: Feiyang Wang, Xiaomin Yu, Wangyu Wu

Potential Business Impact:

Robot solves Rubik's Cubes using sight and thinking.

Business Areas:

Robotics Hardware, Science and Engineering, Software

Proving Rubik's Cube theorems at the high level represents a notable milestone in human-level spatial imagination and logic thinking and reasoning. Traditional Rubik's Cube robots, relying on complex vision systems and fixed algorithms, often struggle to adapt to complex and dynamic scenarios. To overcome this limitation, we introduce CubeRobot, a novel vision-language model (VLM) tailored for solving 3x3 Rubik's Cubes, empowering embodied agents with multimodal understanding and execution capabilities. We used the CubeCoT image dataset, which contains multiple-level tasks (43 subtasks in total) that humans are unable to handle, encompassing various cube states. We incorporate a dual-loop VisionCoT architecture and Memory Stream, a paradigm for extracting task-related features from VLM-generated planning queries, thus enabling CubeRobot to independent planning, decision-making, reflection and separate management of high- and low-level Rubik's Cube tasks. Furthermore, in low-level Rubik's Cube restoration tasks, CubeRobot achieved a high accuracy rate of 100%, similar to 100% in medium-level tasks, and achieved an accuracy rate of 80% in high-level tasks.

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

Robotics

Robots work together better using AI to move things.

5 Jun 2025 1

89%

Improving Generalization of Language-Conditioned Robot Manipulation

Robotics

Robots learn to move objects with few examples.

4 Aug 2025 1

89%

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

Robotics

Tests how well robots understand precise movements.

14 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

6 pages

CubeRobot: Grounding Language in Rubik's Cube Manipulation via Vision-Language Model

Robot solves Rubik's Cubes using sight and thinking.

Technical Abstract

Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System

Improving Generalization of Language-Conditioned Robot Manipulation

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation