RoadMamba: A Dual Branch Visual State Space Model for Road Surface Classification
By: Tianze Wang , Zhang Zhang , Chao Yue and more
Potential Business Impact:
Helps self-driving cars see road texture better.
Acquiring the road surface conditions in advance based on visual technologies provides effective information for the planning and control system of autonomous vehicles, thus improving the safety and driving comfort of the vehicles. Recently, the Mamba architecture based on state-space models has shown remarkable performance in visual processing tasks, benefiting from the efficient global receptive field. However, existing Mamba architectures struggle to achieve state-of-the-art visual road surface classification due to their lack of effective extraction of the local texture of the road surface. In this paper, we explore for the first time the potential of visual Mamba architectures for road surface classification task and propose a method that effectively combines local and global perception, called RoadMamba. Specifically, we utilize the Dual State Space Model (DualSSM) to effectively extract the global semantics and local texture of the road surface and decode and fuse the dual features through the Dual Attention Fusion (DAF). In addition, we propose a dual auxiliary loss to explicitly constrain dual branches, preventing the network from relying only on global semantic information from the deep large receptive field and ignoring the local texture. The proposed RoadMamba achieves the state-of-the-art performance in experiments on a large-scale road surface classification dataset containing 1 million samples.
Similar Papers
PathMamba: A Hybrid Mamba-Transformer for Topologically Coherent Road Segmentation in Satellite Imagery
CV and Pattern Recognition
Maps roads better using faster, smarter computer vision.
DefMamba: Deformable Visual State Space Model
CV and Pattern Recognition
Finds important parts of pictures better.
A Separable Self-attention Inspired by the State Space Model for Computer Vision
CV and Pattern Recognition
Makes computers see pictures faster and better.