Score: 2

Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking

Published: August 10, 2025 | arXiv ID: 2508.07286v1

By: Jian Chen , Jinbao Tian , Yankui Li and more

Potential Business Impact:

Helps computers understand building plans better.

Accurate information extraction from specialized texts is a critical challenge, particularly for named entity recognition (NER) in the architecture, engineering, and construction (AEC) domain to support automated rule checking (ARC). The performance of standard pre-trained models is often constrained by the domain gap, as they struggle to interpret the specialized terminology and complex relational contexts inherent in AEC texts. Although this issue can be mitigated by further pre-training on large, human-curated domain corpora, as exemplified by methods like ARCBERT, this approach is both labor-intensive and cost-prohibitive. Consequently, leveraging large language models (LLMs) for automated knowledge generation has emerged as a promising alternative. However, the optimal strategy for generating knowledge that can genuinely enhance smaller, efficient models remains an open question. To address this, we propose ARCE (augmented RoBERTa with contextualized elucidations), a novel approach that systematically explores and optimizes this generation process. ARCE employs an LLM to first generate a corpus of simple, direct explanations, which we term Cote, and then uses this corpus to incrementally pre-train a RoBERTa model prior to its fine-tuning on the downstream task. Our extensive experiments show that ARCE establishes a new state-of-the-art on a benchmark AEC dataset, achieving a Macro-F1 score of 77.20%. This result also reveals a key finding: simple, explanation-based knowledge proves surprisingly more effective than complex, role-based rationales for this task. The code is publicly available at:https://github.com/nxcc-lab/ARCE.

Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking

Computation and Language

Helps computers understand building plans better.

10 Aug 2025 2

90%

From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning

Computation and Language

Teaches computers to be smart in special subjects.

30 Oct 2025 1

88%

ARC-Encoder: learning compressed text representations for large language models

Computation and Language

Makes AI understand more text with less work.

23 Oct 2025 2

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

5 pages

Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking

Helps computers understand building plans better.

Technical Abstract

Arce: Augmented Roberta with Contextualized Elucidations for Ner in Automated Rule Checking

From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning

ARC-Encoder: learning compressed text representations for large language models