Score: 2

HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework

Published: April 18, 2025 | arXiv ID: 2504.13579v1

By: Shuobin Wei , Zhuang Zhou , Zhengan Lu and more

Potential Business Impact:

Helps robots understand rooms using color and distance.

Business Areas:

Image Recognition Data and Analytics, Software

In RGB-D semantic segmentation for indoor scenes, a key challenge is effectively integrating the rich color information from RGB images with the spatial distance information from depth images. However, most existing methods overlook the inherent differences in how RGB and depth images express information. Properly distinguishing the processing of RGB and depth images is essential to fully exploiting their unique and significant characteristics. To address this, we propose a novel heterogeneous dual-branch framework called HDBFormer, specifically designed to handle these modality differences. For RGB images, which contain rich detail, we employ both a basic and detail encoder to extract local and global features. For the simpler depth images, we propose LDFormer, a lightweight hierarchical encoder that efficiently extracts depth features with fewer parameters. Additionally, we introduce the Modality Information Interaction Module (MIIM), which combines transformers with large kernel convolutions to interact global and local information across modalities efficiently. Extensive experiments show that HDBFormer achieves state-of-the-art performance on the NYUDepthv2 and SUN-RGBD datasets. The code is available at: https://github.com/Weishuobin/HDBFormer.

DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation

CV and Pattern Recognition

Helps robots understand rooms by seeing and measuring.

17 Nov 2025 1

89%

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

CV and Pattern Recognition

Helps computers see better in dark or bright light.

7 Apr 2025 1

88%

PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes

CV and Pattern Recognition

Makes computers understand pictures better without special cameras.

24 Mar 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

6 pages

HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework

Helps robots understand rooms using color and distance.

Technical Abstract

DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes