ViTA-Seg: Vision Transformer for Amodal Segmentation in Robotics
By: Donato Caramia , Florian T. Pokorny , Giuseppe Triggiani and more
Potential Business Impact:
Helps robots see hidden objects for picking.
Occlusions in robotic bin picking compromise accurate and reliable grasp planning. We present ViTA-Seg, a class-agnostic Vision Transformer framework for real-time amodal segmentation that leverages global attention to recover complete object masks, including hidden regions. We proposte two architectures: a) Single-Head for amodal mask prediction; b) Dual-Head for amodal and occluded mask prediction. We also introduce ViTA-SimData, a photo-realistic synthetic dataset tailored to industrial bin-picking scenario. Extensive experiments on two amodal benchmarks, COOCA and KINS, demonstrate that ViTA-Seg Dual Head achieves strong amodal and occlusion segmentation accuracy with computational efficiency, enabling robust, real-time robotic manipulation.
Similar Papers
Unifying Perception and Action: A Hybrid-Modality Pipeline with Implicit Visual Chain-of-Thought for Robotic Action Generation
Robotics
Robot learns to do tasks by watching and thinking.
Segment Anything, Even Occluded
CV and Pattern Recognition
Helps robots see hidden parts of objects.
RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video
CV and Pattern Recognition
Helps robots see and understand themselves better.