Score: 1

TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression

Published: January 8, 2026 | arXiv ID: 2601.04519v1

By: Sen Zeng , Hong Zhou , Zheng Zhu and more

Potential Business Impact:

Finds tumors faster and uses less computer power.

Business Areas:

Image Recognition Data and Analytics, Software

Three-dimensional medical image segmentation is a fundamental yet computationally demanding task due to the cubic growth of voxel processing and the redundant computation on homogeneous regions. To address these limitations, we propose \textbf{TokenSeg}, a boundary-aware sparse token representation framework for efficient 3D medical volume segmentation. Specifically, (1) we design a \emph{multi-scale hierarchical encoder} that extracts 400 candidate tokens across four resolution levels to capture both global anatomical context and fine boundary details; (2) we introduce a \emph{boundary-aware tokenizer} that combines VQ-VAE quantization with importance scoring to select 100 salient tokens, over 60\% of which lie near tumor boundaries; and (3) we develop a \emph{sparse-to-dense decoder} that reconstructs full-resolution masks through token reprojection, progressive upsampling, and skip connections. Extensive experiments on a 3D breast DCE-MRI dataset comprising 960 cases demonstrate that TokenSeg achieves state-of-the-art performance with 94.49\% Dice and 89.61\% IoU, while reducing GPU memory and inference latency by 64\% and 68\%, respectively. To verify the generalization capability, our evaluations on MSD cardiac and brain MRI benchmark datasets demonstrate that TokenSeg consistently delivers optimal performance across heterogeneous anatomical structures. These results highlight the effectiveness of anatomically informed sparse representation for accurate and efficient 3D medical image segmentation.

Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging

CV and Pattern Recognition

Helps doctors understand body scans better.

23 Oct 2025 2

88%

SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation

CV and Pattern Recognition

Lets doctors find body parts using words.

28 Dec 2025 0

88%

HER-Seg: Holistically Efficient Segmentation for High-Resolution Medical Images

Image and Video Processing

Helps doctors see tiny details in medical scans.

8 Apr 2025 2

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

13 pages

TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression

Finds tumors faster and uses less computer power.

Technical Abstract

Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging

SwinTF3D: A Lightweight Multimodal Fusion Approach for Text-Guided 3D Medical Image Segmentation

HER-Seg: Holistically Efficient Segmentation for High-Resolution Medical Images