MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection
By: Liman Wang , Hanyang Zhong , Tianyuan Wang and more
Potential Business Impact:
Helps robots pick the best cloth for jobs.
Choosing the right fabric is crucial to meet functional and quality requirements in robotic applications for textile manufacturing, apparel production, and smart retail. We present MLLM-Fabric, a robotic framework powered by multimodal large language models (MLLMs) for fabric sorting and selection. The system includes a robotic arm, a camera, a visuotactile sensor, and a pressure sensor. It employs supervised fine-tuning and multimodal explanation-guided knowledge distillation to accurately classify and rank fabric properties. To facilitate further research, we release a dataset of 220 unique fabric samples, including RGB images and synchronized visuotactile and pressure data. Experimental results show that our Fabric-Llama-90B model consistently outperforms pretrained vision-language baselines in both property ranking accuracy and selection reliability.
Similar Papers
Large Language Models as Natural Selector for Embodied Soft Robot Design
Robotics
Helps robots learn to design themselves.
Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception
Robotics
Robots learn to fold clothes from instructions.
An analysis of vision-language models for fabric retrieval
CV and Pattern Recognition
Find fabric pictures using text descriptions.