Dual-Granularity Semantic Prompting for Language Guidance Infrared Small Target Detection
By: Zixuan Wang , Haoran Sun , Jiaming Lu and more
Potential Business Impact:
Finds tiny things in dark pictures using words.
Infrared small target detection remains challenging due to limited feature representation and severe background interference, resulting in sub-optimal performance. While recent CLIP-inspired methods attempt to leverage textual guidance for detection, they are hindered by inaccurate text descriptions and reliance on manual annotations. To overcome these limitations, we propose DGSPNet, an end-to-end language prompt-driven framework. Our approach integrates dual-granularity semantic prompts: coarse-grained textual priors (e.g., 'infrared image', 'small target') and fine-grained personalized semantic descriptions derived through visual-to-textual mapping within the image space. This design not only facilitates learning fine-grained semantic information but also can inherently leverage language prompts during inference without relying on any annotation requirements. By fully leveraging the precision and conciseness of text descriptions, we further introduce a text-guide channel attention (TGCA) mechanism and text-guide spatial attention (TGSA) mechanism that enhances the model's sensitivity to potential targets across both low- and high-level feature spaces. Extensive experiments demonstrate that our method significantly improves detection accuracy and achieves state-of-the-art performance on three benchmark datasets.
Similar Papers
Text-IRSTD: Leveraging Semantic Text to Promote Infrared Small Target Detection in Complex Scenes
CV and Pattern Recognition
Helps computers find tiny things in heat pictures.
Multi-Text Guided Few-Shot Semantic Segmentation
CV and Pattern Recognition
Helps computers see objects better with more descriptions.
DENet: Dual-Path Edge Network with Global-Local Attention for Infrared Small Target Detection
CV and Pattern Recognition
Finds tiny things in blurry heat pictures.