Score: 1

CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation

Published: October 20, 2025 | arXiv ID: 2510.18895v1

By: Santhosh Kumar Ravindran

BigTech Affiliations: Microsoft

Potential Business Impact:

Makes AI write better code by learning from mistakes.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We introduce CosmoCore, a neuroscience-inspired reinforcement learning (RL) architecture that integrates affective signals to enhance code generation in large language models (LLMs). Motivated by human and animal learning where embarrassment from mistakes drives rapid correction, as observed in training a puppy to avoid repeating errors after a single scolding CosmoCore tags code generation trajectories with valence and surprise using a lightweight multi-layer perceptron (MLP). High-negative valence (cringe) episodes, such as buggy code outputs, are prioritized in a Dream Queue for five-fold replay during off-policy updates, while low-surprise successes are pruned to prevent overconfidence and buffer bloat. Evaluated on code generation benchmarks like HumanEval and BigCodeBench, alongside simulations with a custom data pipeline environment, CosmoCore reduces hallucinated code (e.g., syntax errors or logical bugs) by 48\% and accelerates self-correction by 45\%. Local experiments using Hugging Face models in a PySpark environment validate these gains, with code snippets provided for replication. Ablations confirm valence tagging boosts curiosity in exploration, and pruning mitigates inefficiency. This framework extends RL from human feedback (RLHF) for more emotionally aware code assistants, with applications in IDEs and data pipelines. Code and the custom mini-world simulation are released.

CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation

Software Engineering

Helps computers learn to create new, better code.

20 Dec 2025 1

87%

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

Machine Learning (CS)

Teaches robots to learn new tasks by watching.

5 Jan 2026 0

87%

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

Artificial Intelligence

Teaches computers to truly understand math, not just copy.

21 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

12 pages

CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation

Makes AI write better code by learning from mistakes.

Technical Abstract

CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning