Score: 2

A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading

Published: October 30, 2025 | arXiv ID: 2510.26315v1

By: Junlai Qiu , Yunzhu Chen , Hao Zheng and more

BigTech Affiliations: Tencent

Potential Business Impact:

Helps doctors spot eye disease earlier and better.

Business Areas:
Image Recognition Data and Analytics, Software

Diabetic retinopathy (DR) is a leading cause of vision loss among middle-aged and elderly people, which significantly impacts their daily lives and mental health. To improve the efficiency of clinical screening and enable the early detection of DR, a variety of automated DR diagnosis systems have been recently established based on convolutional neural network (CNN) or vision Transformer (ViT). However, due to the own shortages of CNN / ViT, the performance of existing methods using single-type backbone has reached a bottleneck. One potential way for the further improvements is integrating different kinds of backbones, which can fully leverage the respective strengths of them (\emph{i.e.,} the local feature extraction capability of CNN and the global feature capturing ability of ViT). To this end, we propose a novel paradigm to effectively fuse the features extracted by different backbones based on the theory of evidence. Specifically, the proposed evidential fusion paradigm transforms the features from different backbones into supporting evidences via a set of deep evidential networks. With the supporting evidences, the aggregated opinion can be accordingly formed, which can be used to adaptively tune the fusion pattern between different backbones and accordingly boost the performance of our hybrid model. We evaluated our method on two publicly available DR grading datasets. The experimental results demonstrate that our hybrid model not only improves the accuracy of DR grading, compared to the state-of-the-art frameworks, but also provides the excellent interpretability for feature fusion and decision-making.

Country of Origin
🇨🇳 China

Page Count
11 pages

Category
Computer Science:
CV and Pattern Recognition