Score: 0

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

Published: January 5, 2026 | arXiv ID: 2601.02201v1

By: Keyu Wang , Bingchen Miao , Wendong Bu and more

Potential Business Impact:

Teaches robots to learn new tasks by watching.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

The development of Multimodal Virtual Agents has made significant progress through the integration of Multimodal Large Language Models. However, mainstream training paradigms face key challenges: Behavior Cloning is simple and effective through imitation but suffers from low behavioral diversity, while Reinforcement Learning is capable of discovering novel strategies through exploration but heavily relies on manually designed reward functions. To address the conflict between these two methods, we present CORE, a Code-based Inverse Self-Training Framework with Graph Expansion that bridges imitation and exploration, offering a novel training framework that promotes behavioral diversity while eliminating the reliance on manually reward design. Specifically, we introduce Semantic Code Abstraction to automatically infers reward functions from expert demonstrations without manual design. The inferred reward function, referred to as the Label Function, is executable code that verifies one key step within a task. Building on this, we propose Strategy Graph Expansion to enhance in-domain behavioral diversity, which constructs a multi-path graph called Strategy Graph that captures diverse valid solutions beyond expert demonstrations. Furthermore, we introduce Trajectory-Guided Extrapolation, which enriches out-of-domain behavioral diversity by utilizing both successful and failed trajectories to expand the task space. Experiments on Web and Android platforms demonstrate that CORE significantly improves both overall performance and generalization, highlighting its potential as a robust and generalizable training paradigm for building powerful virtual agents.

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

Artificial Intelligence

Teaches computers to truly understand math, not just copy.

21 Dec 2025 0

88%

CORE: A Conceptual Reasoning Layer for Large Language Models

Computation and Language

Keeps chatbots remembering conversations better.

10 Dec 2025 0

88%

CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation

Software Engineering

Helps computers learn to create new, better code.

20 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

19 pages

CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents

Teaches robots to learn new tasks by watching.

Technical Abstract

CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

CORE: A Conceptual Reasoning Layer for Large Language Models

CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation