Score: 1

Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language

Published: April 28, 2025 | arXiv ID: 2504.19856v3

By: Anastasia Zhukova, Christian E. Matt, Bela Gipp

Potential Business Impact:

Teaches computers new languages faster, cheaper.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Domain-adaptive continual pretraining (DAPT) is a state-of-the-art technique that further trains a language model (LM) on its pretraining task, e.g., masked language modeling (MLM), when common domain adaptation via LM fine-tuning is not possible due to a lack of labeled task data. Although popular, MLM requires a significant corpus of domain-related data, which is difficult to obtain for specific domains in languages other than English, such as the process industry in the German language. This paper introduces an efficient approach called ICL-augmented pretraining or ICL-APT that leverages in-context learning (ICL) and k-nearest neighbors (kNN) to augment target data with domain-related and in-domain texts, significantly reducing GPU time while maintaining strong model performance. Our results show that the best configuration of ICL-APT performed better than the state-of-the-art DAPT by 28.7% (7.87 points) and requires almost 4 times less GPU-computing time, providing a cost-effective solution for industries with limited computational capacity. The findings highlight the broader applicability of this framework to other low-resource industries, making NLP-based solutions more accessible and feasible in production environments.

Continual Pre-Training is (not) What You Need in Domain Adaption

Computation and Language

Teaches AI to understand laws better.

18 Apr 2025 1

89%

ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Computation and Language

Makes small AI models work much better for businesses.

9 Jul 2025 1

89%

IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation

Artificial Intelligence

Teaches AI new things without forgetting old skills.

23 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

12 pages

Efficient Domain-adaptive Continual Pretraining for the Process Industry in the German Language

Teaches computers new languages faster, cheaper.

Technical Abstract

Continual Pre-Training is (not) What You Need in Domain Adaption

ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation