Score: 0

Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis

Published: January 13, 2026 | arXiv ID: 2601.08196v1

By: Da Song , Yuheng Huang , Boqi Chen and more

The integration of large language models (LLMs) into autonomous agents has enabled complex tool use, yet in high-stakes domains, these systems must strictly adhere to regulatory standards beyond simple functional correctness. However, existing benchmarks often overlook implicit regulatory compliance, thus failing to evaluate whether LLMs can autonomously enforce mandatory safety constraints. To fill this gap, we introduce LogiSafetyGen, a framework that converts unstructured regulations into Linear Temporal Logic oracles and employs logic-guided fuzzing to synthesize valid, safety-critical traces. Building on this framework, we construct LogiSafetyBench, a benchmark comprising 240 human-verified tasks that require LLMs to generate Python programs that satisfy both functional objectives and latent compliance rules. Evaluations of 13 state-of-the-art (SOTA) LLMs reveal that larger models, despite achieving better functional correctness, frequently prioritize task completion over safety, which results in non-compliant behavior.

Neuro-Symbolic Compliance: Integrating LLMs and SMT Solvers for Automated Financial Legal Analysis

Artificial Intelligence

Fixes money rules automatically and correctly.

7 Jan 2026 0

91%

Automatic Generation of Safety-compliant Linear Temporal Logic via Large Language Model: A Self-supervised Framework

Logic in Computer Science

Makes sure computer instructions are safe.

20 Mar 2025 0

90%

Evaluating Metrics for Safety with LLM-as-Judges

Computation and Language

Makes AI safer for important jobs.

17 Dec 2025 0

View PDF Login to Bookmark

Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis

Technical Abstract

Neuro-Symbolic Compliance: Integrating LLMs and SMT Solvers for Automated Financial Legal Analysis

Automatic Generation of Safety-compliant Linear Temporal Logic via Large Language Model: A Self-supervised Framework

Evaluating Metrics for Safety with LLM-as-Judges