Score: 0

STAR: Detecting Inference-time Backdoors in LLM Reasoning via State-Transition Amplification Ratio

Published: January 13, 2026 | arXiv ID: 2601.08511v1

By: Seong-Gyu Park , Sohee Park , Jisu Lee and more

Recent LLMs increasingly integrate reasoning mechanisms like Chain-of-Thought (CoT). However, this explicit reasoning exposes a new attack surface for inference-time backdoors, which inject malicious reasoning paths without altering model parameters. Because these attacks generate linguistically coherent paths, they effectively evade conventional detection. To address this, we propose STAR (State-Transition Amplification Ratio), a framework that detects backdoors by analyzing output probability shifts. STAR exploits the statistical discrepancy where a malicious input-induced path exhibits high posterior probability despite a low prior probability in the model's general knowledge. We quantify this state-transition amplification and employ the CUSUM algorithm to detect persistent anomalies. Experiments across diverse models (8B-70B) and five benchmark datasets demonstrate that STAR exhibits robust generalization capabilities, consistently achieving near-perfect performance (AUROC $\approx$ 1.0) with approximately $42\times$ greater efficiency than existing baselines. Furthermore, the framework proves robust against adaptive attacks attempting to bypass detection.

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules

Artificial Intelligence

Teaches AI to follow rules, stopping bad commands.

7 Jan 2026 1

86%

The STAR-XAI Protocol: A Framework for Inducing and Verifying Agency, Reasoning, and Reliability in AI Agents

Artificial Intelligence

Makes AI explain its thinking and avoid mistakes.

22 Sep 2025 1

86%

Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics

Computation and Language

Shows how computers think step-by-step.

29 Aug 2025 0

View PDF Login to Bookmark

STAR: Detecting Inference-time Backdoors in LLM Reasoning via State-Transition Amplification Ratio

Technical Abstract

STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules

The STAR-XAI Protocol: A Framework for Inducing and Verifying Agency, Reasoning, and Reliability in AI Agents

Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics