Score: 2

Differentiable Hierarchical Visual Tokenization

Published: November 4, 2025 | arXiv ID: 2511.02652v1

By: Marius Aasan , Martine Hjelkrem-Tan , Nico Catalano and more

Potential Business Impact:

Makes computer vision understand pictures better.

Business Areas:

Image Recognition Data and Analytics, Software

Vision Transformers rely on fixed patch tokens that ignore the spatial and semantic structure of images. In this work, we introduce an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity while remaining backward-compatible with existing architectures for retrofitting pretrained models. Our method uses hierarchical model selection with information criteria to provide competitive performance in both image-level classification and dense-prediction tasks, and even supports out-of-the-box raster-to-vector conversion.