Score: 1

ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations

Published: November 15, 2025 | arXiv ID: 2511.12249v1

By: Khang T. Huynh, Dung H. Nguyen, Binh T. Nguyen

Potential Business Impact:

Helps computers understand Vietnamese words better.

Business Areas:

Semantic Search Internet Services

Recent advances in contextualized word embeddings have greatly improved semantic tasks such as Word Sense Disambiguation (WSD) and contextual similarity, but most progress has been limited to high-resource languages like English. Vietnamese, in contrast, still lacks robust models and evaluation resources for fine-grained semantic understanding. In this paper, we present ViConBERT, a novel framework for learning Vietnamese contextualized embeddings that integrates contrastive learning (SimCLR) and gloss-based distillation to better capture word meaning. We also introduce ViConWSD, the first large-scale synthetic dataset for evaluating semantic understanding in Vietnamese, covering both WSD and contextual similarity. Experimental results show that ViConBERT outperforms strong baselines on WSD (F1 = 0.87) and achieves competitive performance on ViCon (AP = 0.88) and ViSim-400 (Spearman's rho = 0.60), demonstrating its effectiveness in modeling both discrete senses and graded semantic relations. Our code, models, and data are available at https://github.com/tkhangg0910/ViConBERT

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

CV and Pattern Recognition

Makes computers understand long texts better.

17 Dec 2025 2

87%

The aftermath of compounds: Investigating Compounds and their Semantic Representations

Computation and Language

Helps computers understand word meanings better.

31 Oct 2025 0

87%

Context Matters: Learning Global Semantics for Visual Reasoning and Comprehension

CV and Pattern Recognition

Teaches computers to understand pictures like words.

7 Oct 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

12 pages

ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations

Helps computers understand Vietnamese words better.

Technical Abstract

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

The aftermath of compounds: Investigating Compounds and their Semantic Representations

Context Matters: Learning Global Semantics for Visual Reasoning and Comprehension