Score: 0

Towards Logic-Aware Manipulation: A Knowledge Primitive for VLM-Based Assistants in Smart Manufacturing

Published: December 12, 2025 | arXiv ID: 2512.11275v1

By: Suchang Chen, Daqiang Guo

Existing pipelines for vision-language models (VLMs) in robotic manipulation prioritize broad semantic generalization from images and language, but typically omit execution-critical parameters required for contact-rich actions in manufacturing cells. We formalize an object-centric manipulation-logic schema, serialized as an eight-field tuple τ, which exposes object, interface, trajectory, tolerance, and force/impedance information as a first-class knowledge signal between human operators, VLM-based assistants, and robot controllers. We instantiate τ and a small knowledge base (KB) on a 3D-printer spool-removal task in a collaborative cell, and analyze τ-conditioned VLM planning using plan-quality metrics adapted from recent VLM/LLM planning benchmarks, while demonstrating how the same schema supports taxonomy-tagged data augmentation at training time and logic-aware retrieval-augmented prompting at test time as a building block for assistant systems in smart manufacturing enterprises.