Score: 0

The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces

Published: December 15, 2025 | arXiv ID: 2512.13821v1

By: Subramanyam Sahoo, Jared Junkin

Large language models (LLMs) increasingly generate code with minimal human oversight, raising critical concerns about backdoor injection and malicious behavior. We present Cross-Trace Verification Protocol (CTVP), a novel AI control framework that verifies untrusted code-generating models through semantic orbit analysis. Rather than directly executing potentially malicious code, CTVP leverages the model's own predictions of execution traces across semantically equivalent program transformations. By analyzing consistency patterns in these predicted traces, we detect behavioral anomalies indicative of backdoors. Our approach introduces the Adversarial Robustness Quotient (ARQ), which quantifies the computational cost of verification relative to baseline generation, demonstrating exponential growth with orbit size. Theoretical analysis establishes information-theoretic bounds showing non-gamifiability -- adversaries cannot improve through training due to fundamental space complexity constraints. This work demonstrates that semantic orbit analysis provides a scalable, theoretically grounded approach to AI control for code generation tasks.

GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

CV and Pattern Recognition

Keeps AI from making bad choices while thinking.

26 Nov 2025 0

87%

Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems

Software Engineering

Makes self-driving cars safer and more trustworthy.

18 Nov 2025 0

87%

Generating Verifiable CoT from Execution-Traces

Software Engineering

Teaches computers to understand code by watching it run.

28 Nov 2025 2

View PDF Login to Bookmark

The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces

Technical Abstract

GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems

Generating Verifiable CoT from Execution-Traces