Score: 3

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models

Published: January 7, 2026 | arXiv ID: 2601.03555v1

By: Yuxuan Jiang, Francis Ferraro

Potential Business Impact:

Teaches robots to use tools better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Training reliable tool-augmented agents remains a significant challenge, largely due to the difficulty of credit assignment in multi-step reasoning. While process-level reward models offer a promising direction, existing LLM-based judges often produce noisy and inconsistent signals because they lack fine-grained, task-specific rubrics to distinguish high-level planning from low-level execution. In this work, we introduce SCRIBE (Skill-Conditioned Reward with Intermediate Behavioral Evaluation), a reinforcement learning framework that intervenes at a novel mid-level abstraction. SCRIBE grounds reward modeling in a curated library of skill prototypes, transforming open-ended LLM evaluation into a constrained verification problem. By routing each subgoal to a corresponding prototype, the reward model is equipped with precise, structured rubrics that substantially reduce reward variance. Experimental results show that SCRIBE achieves state-of-the-art performance across a range of reasoning and tool-use benchmarks. In particular, it improves the AIME25 accuracy of a Qwen3-4B model from 43.3% to 63.3%, and significantly increases success rates in complex multi-turn tool interactions. Further analysis of training dynamics reveals a co-evolution across abstraction levels, where mastery of mid-level skills consistently precedes the emergence of effective high-level planning behaviors. Finally, we demonstrate that SCRIBE is additive to low-level tool optimizations, providing a scalable and complementary pathway toward more autonomous and reliable tool-using agents.

SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling

Computation and Language

Helps teachers give better, private student feedback.

30 Oct 2025 1

87%

Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents

Computation and Language

Teaches computers to use tools with voice commands.

17 Sep 2025 3

87%

SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments

Artificial Intelligence

Teaches computers to plan faster without asking experts.

10 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com github.com huggingface.co

Page Count

16 pages

SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models

Teaches robots to use tools better.

Technical Abstract

SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling

Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents

SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments