Score: 1

VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Published: July 23, 2025 | arXiv ID: 2507.17294v1

By: Jianxin Bi , Kevin Yuchen Ma , Ce Hao and more

Potential Business Impact:

Robots feel and do tasks better.

Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.

OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

Robotics

Robots use touch to do tasks better.

12 Aug 2025 1

93%

OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

Robotics

Robots feel and grab things better.

12 Aug 2025 1

93%

TLA: Tactile-Language-Action Model for Contact-Rich Manipulation

Robotics

Teaches robots to feel and follow instructions.

11 Mar 2025 0

View PDF Login to Bookmark

Country of Origin

🇸🇬 Singapore

Page Count

19 pages

VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Robots feel and do tasks better.

Technical Abstract

OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

TLA: Tactile-Language-Action Model for Contact-Rich Manipulation