G2rammar: Bilingual Grammar Modeling for Enhanced Text-attributed Graph Learning
By: Heng Zheng , Haochen You , Zijun Liu and more
Potential Business Impact:
Teaches computers to understand how information connects.
Text-attributed graphs require models to effectively integrate both structural topology and semantic content. Recent approaches apply large language models to graphs by linearizing structures into token sequences through random walks. These methods create concise graph vocabularies to replace verbose natural language descriptions. However, they overlook a critical component that makes language expressive: grammar. In natural language, grammar assigns syntactic roles to words and defines their functions within sentences. Similarly, nodes in graphs play distinct structural roles as hubs, bridges, or peripheral members. Current graph language methods provide tokens without grammatical annotations to indicate these structural or semantic roles. This absence limits language models' ability to reason about graph topology effectively. We propose \textbf{G2rammar}, a bilingual grammar framework that explicitly encodes both structural and semantic grammar for text-attributed graphs. Structural grammar characterizes topological roles through centrality and neighborhood patterns. Semantic grammar captures content relationships through textual informativity. The framework implements two-stage learning with structural grammar pre-training followed by semantic grammar fine-tuning. Extensive experiments on real-world datasets demonstrate that G2rammar consistently outperforms competitive baselines by providing language models with the grammatical context needed to understand graph structures.
Similar Papers
Each Graph is a New Language: Graph Learning with LLMs
Computation and Language
Teaches computers to understand connections in data.
A Novel Graph-Sequence Learning Model for Inductive Text Classification
Computation and Language
Helps computers understand text better, even new words.
When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected
Machine Learning (CS)
Computers understand complex connections without needing extra rules.