Towards Logic-Aware Manipulation: A Knowledge Primitive for VLM-Based Assistants in Smart Manufacturing
By: Suchang Chen, Daqiang Guo
Existing pipelines for vision-language models (VLMs) in robotic manipulation prioritize broad semantic generalization from images and language, but typically omit execution-critical parameters required for contact-rich actions in manufacturing cells. We formalize an object-centric manipulation-logic schema, serialized as an eight-field tuple τ, which exposes object, interface, trajectory, tolerance, and force/impedance information as a first-class knowledge signal between human operators, VLM-based assistants, and robot controllers. We instantiate τ and a small knowledge base (KB) on a 3D-printer spool-removal task in a collaborative cell, and analyze τ-conditioned VLM planning using plan-quality metrics adapted from recent VLM/LLM planning benchmarks, while demonstrating how the same schema supports taxonomy-tagged data augmentation at training time and logic-aware retrieval-augmented prompting at test time as a building block for assistant systems in smart manufacturing enterprises.
Similar Papers
Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills
Robotics
Robots learn complex tasks without human help.
VLM-driven Skill Selection for Robotic Assembly Tasks
Robotics
Robot builds things by watching and listening.
Rethinking Intermediate Representation for VLM-based Robot Manipulation
Robotics
Helps robots understand and do new tasks.