Score: 0

Automatic Reward Shaping from Multi-Objective Human Heuristics

Published: December 17, 2025 | arXiv ID: 2512.15120v1

By: Yuqing Xie , Jiayu Chen , Wenhao Tang and more

Designing effective reward functions remains a central challenge in reinforcement learning, especially in multi-objective environments. In this work, we propose Multi-Objective Reward Shaping with Exploration (MORSE), a general framework that automatically combines multiple human-designed heuristic rewards into a unified reward function. MORSE formulates the shaping process as a bi-level optimization problem: the inner loop trains a policy to maximize the current shaped reward, while the outer loop updates the reward function to optimize task performance. To encourage exploration in the reward space and avoid suboptimal local minima, MORSE introduces stochasticity into the shaping process, injecting noise guided by task performance and the prediction error of a fixed, randomly initialized neural network. Experimental results in MuJoCo and Isaac Sim environments show that MORSE effectively balances multiple objectives across various robotic tasks, achieving task performance comparable to those obtained with manually tuned reward functions.

MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization

Machine Learning (CS)

Teaches AI to control traffic better using smart lessons.

24 Nov 2025 1

88%

Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective

Computation and Language

Teaches AI to do many things well.

25 Sep 2025 0

87%

MORSE: Multi-Objective Reinforcement Learning via Strategy Evolution for Supply Chain Optimization

Artificial Intelligence

Helps companies make better, faster supply chain choices.

8 Sep 2025 1

View PDF Login to Bookmark

Automatic Reward Shaping from Multi-Objective Human Heuristics

Technical Abstract

MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization

Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective

MORSE: Multi-Objective Reinforcement Learning via Strategy Evolution for Supply Chain Optimization