ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models
By: Yuanfeng Xu , Zehui Dai , Jian Liang and more
Potential Business Impact:
Teaches small computers to think better on their own.
Small Language Models (SLMs) are a cost-effective alternative to Large Language Models (LLMs), but often struggle with complex reasoning due to their limited capacity and a tendency to produce mistakes or inconsistent answers during multi-step reasoning. Existing efforts have improved SLM performance, but typically at the cost of one or more of three key aspects: (1) reasoning capability, due to biased supervision that filters out negative reasoning paths and limits learning from errors; (2) autonomy, due to over-reliance on externally generated reasoning signals; and (3) generalization, which suffers when models overfit to teacher-specific patterns. In this paper, we introduce ReaLM, a reinforcement learning framework for robust and self-sufficient reasoning in vertical domains. To enhance reasoning capability, we propose Multi-Route Process Verification (MRPV), which contrasts both positive and negative reasoning paths to extract decisive patterns. To reduce reliance on external guidance and improve autonomy, we introduce Enabling Autonomy via Asymptotic Induction (EAAI), a training strategy that gradually fades external signals. To improve generalization, we apply guided chain-of-thought distillation to encode domain-specific rules and expert knowledge into SLM parameters, making them part of what the model has learned. Extensive experiments on both vertical and general reasoning tasks demonstrate that ReaLM significantly improves SLM performance across aspects (1)-(3) above.
Similar Papers
Guiding Reasoning in Small Language Models with LLM Assistance
Computation and Language
Helps small AI do hard thinking with big AI help.
ReTraceQA: Evaluating Reasoning Traces of Small Language Models in Commonsense Question Answering
Computation and Language
Finds when AI answers right but thinks wrong.
Reasoning Models Reason Well, Until They Don't
Artificial Intelligence
Makes smart computers better at solving hard problems.