Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval
By: Fangke Chen , Tianhao Dong , Sirry Chen and more
Potential Business Impact:
Lets computers read any handwriting, anywhere.
Handwritten word retrieval is vital for digital archives but remains challenging due to large handwriting variability and cross-lingual semantic gaps. While large vision-language models offer potential solutions, their prohibitive computational costs hinder practical edge deployment. To address this, we propose a lightweight asymmetric dual-encoder framework that learns unified, style-invariant visual embeddings. By jointly optimizing instance-level alignment and class-level semantic consistency, our approach anchors visual embeddings to language-agnostic semantic prototypes, enforcing invariance across scripts and writing styles. Experiments show that our method outperforms 28 baselines and achieves state-of-the-art accuracy on within-language retrieval benchmarks. We further conduct explicit cross-lingual retrieval, where the query language differs from the target language, to validate the effectiveness of the learned cross-lingual representations. Achieving strong performance with only a fraction of the parameters required by existing models, our framework enables accurate and resource-efficient cross-script handwriting retrieval.
Similar Papers
ScriptViT: Vision Transformer-Based Personalized Handwriting Generation
CV and Pattern Recognition
Makes computer handwriting look like real people's writing.
Image Recognition with Vision and Language Embeddings of VLMs
CV and Pattern Recognition
Helps computers understand pictures better with words or just sight.
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
CV and Pattern Recognition
Helps computers find answers in any language document.