Score: 1

Rethinking Graph-Based Document Classification: Learning Data-Driven Structures Beyond Heuristic Approaches

Published: July 18, 2025 | arXiv ID: 2508.00864v1

By: Margarita Bugueño, Gerard de Melo

Potential Business Impact:

Links sentences automatically to sort documents better

In document classification, graph-based models effectively capture document structure, overcoming sequence length limitations and enhancing contextual understanding. However, most existing graph document representations rely on heuristics, domain-specific rules, or expert knowledge. Unlike previous approaches, we propose a method to learn data-driven graph structures, eliminating the need for manual design and reducing domain dependence. Our approach constructs homogeneous weighted graphs with sentences as nodes, while edges are learned via a self-attention model that identifies dependencies between sentence pairs. A statistical filtering strategy aims to retain only strongly correlated sentences, improving graph quality while reducing the graph size. Experiments on three document classification datasets demonstrate that learned graphs consistently outperform heuristic-based graphs, achieving higher accuracy and $F_1$ score. Furthermore, our study demonstrates the effectiveness of the statistical filtering in improving classification robustness. These results highlight the potential of automatic graph generation over traditional heuristic approaches and open new directions for broader applications in NLP.

When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected

Machine Learning (CS)

Computers understand complex connections without needing extra rules.

20 Nov 2025 1

88%

Knowledge Graph-Infused Fine-Tuning for Structured Reasoning in Large Language Models

Computation and Language

Helps computers understand facts and connect ideas.

20 Aug 2025 0

87%

StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching

Computation and Language

Helps computers understand legal documents better.

2 Sep 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

11 pages

Rethinking Graph-Based Document Classification: Learning Data-Driven Structures Beyond Heuristic Approaches

Links sentences automatically to sort documents better

Technical Abstract

When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected

Knowledge Graph-Infused Fine-Tuning for Structured Reasoning in Large Language Models

StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching