Score: 0

An Encoder-Integrated PhoBERT with Graph Attention for Vietnamese Token-Level Classification

Published: October 13, 2025 | arXiv ID: 2510.11537v1

By: Ba-Quang Nguyen

Potential Business Impact:

Helps computers understand Vietnamese text better.

Business Areas:

Text Analytics Data and Analytics, Software

We propose a novel neural architecture named TextGraphFuseGAT, which integrates a pretrained transformer encoder (PhoBERT) with Graph Attention Networks for token-level classification tasks. The proposed model constructs a fully connected graph over the token embeddings produced by PhoBERT, enabling the GAT layer to capture rich inter-token dependencies beyond those modeled by sequential context alone. To further enhance contextualization, a Transformer-style self-attention layer is applied on top of the graph-enhanced embeddings. The final token representations are passed through a classification head to perform sequence labeling. We evaluate our approach on three Vietnamese benchmark datasets: PhoNER-COVID19 for named entity recognition in the COVID-19 domain, PhoDisfluency for speech disfluency detection, and VietMed-NER for medical-domain NER. VietMed-NER is the first Vietnamese medical spoken NER dataset, featuring 18 entity types collected from real-world medical speech transcripts and annotated with the BIO tagging scheme. Its specialized vocabulary and domain-specific expressions make it a challenging benchmark for token-level classification models. Experimental results show that our method consistently outperforms strong baselines, including transformer-only and hybrid neural models such as BiLSTM + CNN + CRF, confirming the effectiveness of combining pretrained semantic features with graph-based relational modeling for improved token classification across multiple domains.

BERT-based model for Vietnamese Fact Verification Dataset

Computation and Language

Helps check if Vietnamese news is true.

1 Mar 2025 0

86%

ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations

Computation and Language

Helps computers understand Vietnamese words better.

15 Nov 2025 1

86%

Advancing Text Classification with Large Language Models and Neural Attention Mechanisms

Computation and Language

Helps computers understand and sort text better.

10 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇻🇳 Viet Nam

Page Count

11 pages

An Encoder-Integrated PhoBERT with Graph Attention for Vietnamese Token-Level Classification

Helps computers understand Vietnamese text better.

Technical Abstract

BERT-based model for Vietnamese Fact Verification Dataset

ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations

Advancing Text Classification with Large Language Models and Neural Attention Mechanisms