Score: 0

When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents

Published: January 1, 2026 | arXiv ID: 2601.00513v1

By: Laksh Advani

Potential Business Impact:

Fixes AI that gives right answers for wrong reasons.

Business Areas:

Intelligent Systems Artificial Intelligence, Data and Analytics, Science and Engineering

Deploying small language models (7-9B parameters) as autonomous agents requires trust in their reasoning, not just their outputs. We reveal a critical reliability crisis: 50-69\% of correct answers from these models contain fundamentally flawed reasoning -- a ``Right-for-Wrong-Reasons'' phenomenon invisible to standard accuracy metrics. Through analysis of 10,734 reasoning traces across three models and diverse tasks, we introduce the Reasoning Integrity Score (RIS), a process-based metric validated with substantial inter-rater agreement ($κ=0.657$). Conventional practices are challenged by our findings: while retrieval-augmented generation (RAG) significantly improves reasoning integrity (Cohen's $d=0.23$--$0.93$), meta-cognitive interventions like self-critique often harm performance ($d=-0.14$ to $-0.33$) in small models on the evaluated tasks. Mechanistic analysis reveals RAG succeeds by grounding calculations in external evidence, reducing errors by 7.6\%, while meta-cognition amplifies confusion without sufficient model capacity. To enable deployment, verification capabilities are distilled into a neural classifier achieving 0.86 F1-score with 100$\times$ speedup. These results underscore the necessity of process-based verification for trustworthy agents: accuracy alone is dangerously insufficient when models can be right for entirely wrong reasons.

Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research

Computers and Society

Helps computers find child safety risks faster.

3 Dec 2025 1

89%

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models

Artificial Intelligence

Teaches computers to think and check their own answers.

8 Jan 2026 0

89%

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models

Artificial Intelligence

Makes AI think better by arguing with itself.

8 Jan 2026 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

6 pages

When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents

Fixes AI that gives right answers for wrong reasons.

Technical Abstract

Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models