Consistent View Alignment Improves Foundation Models for 3D Medical Image Segmentation
By: Puru Vaish , Felix Meister , Tobias Heimann and more
Potential Business Impact:
Teaches computers to learn better from different pictures.
Many recent approaches in representation learning implicitly assume that uncorrelated views of a data point are sufficient to learn meaningful representations for various downstream tasks. In this work, we challenge this assumption and demonstrate that meaningful structure in the latent space does not emerge naturally. Instead, it must be explicitly induced. We propose a method that aligns representations from different views of the data to align complementary information without inducing false positives. Our experiments show that our proposed self-supervised learning method, Consistent View Alignment, improves performance for downstream tasks, highlighting the critical role of structured view alignment in learning effective representations. Our method achieved first and second place in the MICCAI 2025 SSL3D challenge when using a Primus vision transformer and ResEnc convolutional neural network, respectively. The code and pretrained model weights are released at https://github.com/Tenbatsu24/LatentCampus.
Similar Papers
View-Consistent Diffusion Representations for 3D-Consistent Video Generation
CV and Pattern Recognition
Makes computer-made videos look more real.
Emergent Extreme-View Geometry in 3D Foundation Models
CV and Pattern Recognition
Makes 3D pictures work even with weird camera angles.
3D-Consistent Multi-View Editing by Diffusion Guidance
CV and Pattern Recognition
Makes 3D pictures look right after editing.