MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention
By: Zunhui Xia, Hongxing Li, Libin Lan
Potential Business Impact:
Helps doctors find sickness in X-rays faster.
Medical image recognition serves as a key way to aid in clinical diagnosis, enabling more accurate and timely identification of diseases and abnormalities. Vision transformer-based approaches have proven effective in handling various medical recognition tasks. However, these methods encounter two primary challenges. First, they are often task-specific and architecture-tailored, limiting their general applicability. Second, they usually either adopt full attention to model long-range dependencies, resulting in high computational costs, or rely on handcrafted sparse attention, potentially leading to suboptimal performance. To tackle these issues, we present MedFormer, an efficient medical vision transformer with two key ideas. First, it employs a pyramid scaling structure as a versatile backbone for various medical image recognition tasks, including image classification and dense prediction tasks such as semantic segmentation and lesion detection. This structure facilitates hierarchical feature representation while reducing the computation load of feature maps, highly beneficial for boosting performance. Second, it introduces a novel Dual Sparse Selection Attention (DSSA) with content awareness to improve computational efficiency and robustness against noise while maintaining high performance. As the core building technique of MedFormer, DSSA is designed to explicitly attend to the most relevant content. Theoretical analysis demonstrates that MedFormer outperforms existing medical vision transformers in terms of generality and efficiency. Extensive experiments across various imaging modality datasets show that MedFormer consistently enhances performance in all three medical image recognition tasks mentioned above. MedFormer provides an efficient and versatile solution for medical image recognition, with strong potential for clinical application.
Similar Papers
TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation
CV and Pattern Recognition
Helps doctors see inside bodies better.
DuoFormer: Leveraging Hierarchical Representations by Local and Global Attention Vision Transformer
CV and Pattern Recognition
Helps doctors see diseases better in medical pictures.
HBFormer: A Hybrid-Bridge Transformer for Microtumor and Miniature Organ Segmentation
CV and Pattern Recognition
Finds tiny tumors and organs in medical scans.