Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic
By: Zhenjiang Mao , Artem Bisliouk , Rohith Reddy Nama and more
Potential Business Impact:
Makes AI math answers more trustworthy.
Large Language Models (LLMs) have shown impressive performance in mathematical reasoning tasks when guided by Chain-of-Thought (CoT) prompting. However, they tend to produce highly confident yet incorrect outputs, which poses significant risks in domains like education, where users may lack the expertise to assess reasoning steps. To address this, we propose a structured framework that models stepwise confidence as a temporal signal and evaluates it using Signal Temporal Logic (STL). In particular, we define formal STL-based constraints to capture desirable temporal properties and compute robustness scores that serve as structured, interpretable confidence estimates. Our approach also introduces a set of uncertainty reshaping strategies to enforce smoothness, monotonicity, and causal consistency across the reasoning trajectory. Experiments show that our approach consistently improves calibration metrics and provides more reliable uncertainty estimates than conventional confidence aggregation and post-hoc calibration.
Similar Papers
Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs
Software Engineering
Helps computers write better code by thinking less.
Non-Iterative Symbolic-Aided Chain-of-Thought for Logical Reasoning
Artificial Intelligence
Helps computers think through problems better.
Answer Convergence as a Signal for Early Stopping in Reasoning
Computation and Language
Makes smart computers think less, saving time and money.