Score: 1

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

Published: November 26, 2025 | arXiv ID: 2511.21928v1

By: Yifan Zhou , Sachin Grover , Mohamed El Mistiri and more

Potential Business Impact:

Teaches robots to learn faster with words.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Reinforcement Learning (RL) traditionally relies on scalar reward signals, limiting its ability to leverage the rich semantic knowledge often available in real-world tasks. In contrast, humans learn efficiently by combining numerical feedback with language, prior knowledge, and common sense. We introduce Prompted Policy Search (ProPS), a novel RL method that unifies numerical and linguistic reasoning within a single framework. Unlike prior work that augment existing RL components with language, ProPS places a large language model (LLM) at the center of the policy optimization loop-directly proposing policy updates based on both reward feedback and natural language input. We show that LLMs can perform numerical optimization in-context, and that incorporating semantic signals, such as goals, domain knowledge, and strategy hints can lead to more informed exploration and sample-efficient learning. ProPS is evaluated across fifteen Gymnasium tasks, spanning classic control, Atari games, and MuJoCo environments, and compared to seven widely-adopted RL algorithms (e.g., PPO, SAC, TRPO). It outperforms all baselines on eight out of fifteen tasks and demonstrates substantial gains when provided with domain knowledge. These results highlight the potential of unifying semantics and numerics for transparent, generalizable, and human-aligned RL.

PARL: Prompt-based Agents for Reinforcement Learning

Computation and Language

Teaches computers to learn by trying things.

24 Oct 2025 0

90%

LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs

Machine Learning (CS)

Helps AI learn math faster from past mistakes.

18 Oct 2025 0

89%

SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement

Robotics

Robots learn to play table tennis by practicing.

29 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com github.com github.com

Page Count

55 pages

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

Teaches robots to learn faster with words.

Technical Abstract

PARL: Prompt-based Agents for Reinforcement Learning

LANPO: Bootstrapping Language and Numerical Feedback for Reinforcement Learning in LLMs

SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement