Reasoning Segmentation for Images and Videos: A Survey
By: Yiqing Shen , Chenjia Li , Fei Xiong and more
Potential Business Impact:
Lets computers understand what you mean by words.
Reasoning Segmentation (RS) aims to delineate objects based on implicit text queries, the interpretation of which requires reasoning and knowledge integration. Unlike the traditional formulation of segmentation problems that relies on fixed semantic categories or explicit prompting, RS bridges the gap between visual perception and human-like reasoning capabilities, facilitating more intuitive human-AI interaction through natural language. Our work presents the first comprehensive survey of RS for image and video processing, examining 26 state-of-the-art methods together with a review of the corresponding evaluation metrics, as well as 29 datasets and benchmarks. We also explore existing applications of RS across diverse domains and identify their potential extensions. Finally, we identify current research gaps and highlight promising future directions.
Similar Papers
Temporally-Constrained Video Reasoning Segmentation and Automated Benchmark Construction
CV and Pattern Recognition
Finds objects in videos using text descriptions.
Reinforcing Video Reasoning Segmentation to Think Before It Segments
CV and Pattern Recognition
Helps computers understand what you want to see in videos.
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
CV and Pattern Recognition
Teaches computers to understand and cut out moving objects.