An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars
By: Barkin Buyukcakir , Jannick De Tobel , Patrick Thevissen and more
Potential Business Impact:
Helps doctors guess ages from teeth better.
The practical adoption of deep learning in high-stakes forensic applications, such as dental age estimation, is often limited by the 'black box' nature of the models. This study introduces a framework designed to enhance both performance and transparency in this context. We use a notable performance disparity in the automated staging of mandibular second (tooth 37) and third (tooth 38) molars as a case study. The proposed framework, which combines a convolutional autoencoder (AE) with a Vision Transformer (ViT), improves classification accuracy for both teeth over a baseline ViT, increasing from 0.712 to 0.815 for tooth 37 and from 0.462 to 0.543 for tooth 38. Beyond improving performance, the framework provides multi-faceted diagnostic insights. Analysis of the AE's latent space metrics and image reconstructions indicates that the remaining performance gap is data-centric, suggesting high intra-class morphological variability in the tooth 38 dataset is a primary limiting factor. This work highlights the insufficiency of relying on a single mode of interpretability, such as attention maps, which can appear anatomically plausible yet fail to identify underlying data issues. By offering a methodology that both enhances accuracy and provides evidence for why a model may be uncertain, this framework serves as a more robust tool to support expert decision-making in forensic age estimation.
Similar Papers
Masked Registration and Autoencoding of CT Images for Predictive Tibia Reconstruction
CV and Pattern Recognition
Helps doctors rebuild broken leg bones perfectly.
When CNNs Outperform Transformers and Mambas: Revisiting Deep Architectures for Dental Caries Segmentation
CV and Pattern Recognition
Finds cavities in X-rays better than other methods.
Functional Localization Enforced Deep Anomaly Detection Using Fundus Images
CV and Pattern Recognition
Finds eye diseases in pictures better.