Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation
By: Yiheng Lyu , Lian Xu , Mohammed Bennamoun and more
Potential Business Impact:
Helps doctors see inside bodies better.
Weakly supervised semantic segmentation offers a label-efficient solution to train segmentation models for volumetric medical imaging. However, existing approaches often rely on 2D encoders that neglect the inherent volumetric nature of the data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context for weakly supervised volumetric medical segmentation. TranSamba augments a standard Vision Transformer backbone with Cross-Plane Mamba blocks, which leverage the linear complexity of state space models for efficient information exchange across neighboring slices. The information exchange enhances the pairwise self-attention within slices computed by the Transformer blocks, directly contributing to the attention maps for object localization. TranSamba achieves effective volumetric modeling with time complexity that scales linearly with the input volume depth and maintains constant memory usage for batch processing. Extensive experiments on three datasets demonstrate that TranSamba establishes new state-of-the-art performance, consistently outperforming existing methods across diverse modalities and pathologies. Our source code and trained models are openly accessible at: https://github.com/YihengLyu/TranSamba.
Similar Papers
HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation
CV and Pattern Recognition
Helps doctors see inside bodies better.
A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation
CV and Pattern Recognition
Helps doctors see inside bodies better, faster.
PathMamba: A Hybrid Mamba-Transformer for Topologically Coherent Road Segmentation in Satellite Imagery
CV and Pattern Recognition
Maps roads better using faster, smarter computer vision.