Score: 2

SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection

Published: November 13, 2025 | arXiv ID: 2511.09870v1

By: Jia Lin , Xiaofei Zhou , Jiyuan Liu and more

Potential Business Impact:

Helps computers find moving objects in videos.

Business Areas:

Image Recognition Data and Analytics, Software

Recently segment anything model (SAM) has attracted widespread concerns, and it is often treated as a vision foundation model for universal segmentation. Some researchers have attempted to directly apply the foundation model to the RGB-D video salient object detection (RGB-D VSOD) task, which often encounters three challenges, including the dependence on manual prompts, the high memory consumption of sequential adapters, and the computational burden of memory attention. To address the limitations, we propose a novel method, namely Segment Anything Model with Depth-guided Adaptive Queries (SAM-DAQ), which adapts SAM2 to pop-out salient objects from videos by seamlessly integrating depth and temporal cues within a unified framework. Firstly, we deploy a parallel adapter-based multi-modal image encoder (PAMIE), which incorporates several depth-guided parallel adapters (DPAs) in a skip-connection way. Remarkably, we fine-tune the frozen SAM encoder under prompt-free conditions, where the DPA utilizes depth cues to facilitate the fusion of multi-modal features. Secondly, we deploy a query-driven temporal memory (QTM) module, which unifies the memory bank and prompt embeddings into a learnable pipeline. Concretely, by leveraging both frame-level queries and video-level queries simultaneously, the QTM module can not only selectively extract temporal consistency features but also iteratively update the temporal representations of the queries. Extensive experiments are conducted on three RGB-D VSOD datasets, and the results show that the proposed SAM-DAQ consistently outperforms state-of-the-art methods in terms of all evaluation metrics.

SAM 3: Segment Anything with Concepts

CV and Pattern Recognition

Finds and tracks any object you describe.

20 Nov 2025 1

90%

KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection

Multimedia

Makes cameras see important things in any light.

8 Apr 2025 1

90%

AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception

CV and Pattern Recognition

Helps self-driving cars see and understand roads.

30 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

9 pages

SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection

Helps computers find moving objects in videos.

Technical Abstract

SAM 3: Segment Anything with Concepts

KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection

AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception