Score: 0

MERGE: Minimal Expression-Replacement GEneralization Test for Natural Language Inference

Published: October 28, 2025 | arXiv ID: 2510.24295v1

By: Mădălina Zgreabăn, Tejaswini Deoskar, Lasha Abzianidze

Potential Business Impact:

Makes AI understand sentences better, even when words change.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

In recent years, many generalization benchmarks have shown language models' lack of robustness in natural language inference (NLI). However, manually creating new benchmarks is costly, while automatically generating high-quality ones, even by modifying existing benchmarks, is extremely difficult. In this paper, we propose a methodology for automatically generating high-quality variants of original NLI problems by replacing open-class words, while crucially preserving their underlying reasoning. We dub our generalization test as MERGE (Minimal Expression-Replacements GEneralization), which evaluates the correctness of models' predictions across reasoning-preserving variants of the original problem. Our results show that NLI models' perform 4-20% worse on variants, suggesting low generalizability even on such minimally altered problems. We also analyse how word class of the replacements, word probability, and plausibility influence NLI models' performance.

Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression

Artificial Intelligence

Tests if AI can think logically, even when rules change.

6 Dec 2025 1

87%

Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression

Artificial Intelligence

Computers struggle with missing or wrong logic.

6 Dec 2025 1

87%

Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning

Computation and Language

Helps computers understand sentence structure better.

19 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇳🇱 Netherlands

Page Count

17 pages

MERGE: Minimal Expression-Replacement GEneralization Test for Natural Language Inference

Makes AI understand sentences better, even when words change.

Technical Abstract

Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression

Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression

Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning