Constrained Policy Optimization via Sampling-Based Weight-Space Projection
By: Shengfan Cao, Francesco Borrelli
Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy unknown, rollout-based safety constraints. We propose SCPO, a sampling-based weight-space projection method that enforces safety directly in parameter space without requiring gradient access to the constraint functions. Our approach constructs a local safe region by combining trajectory rollouts with smoothness bounds that relate parameter changes to shifts in safety metrics. Each gradient update is then projected via a convex SOCP, producing a safe first-order step. We establish a safe-by-induction guarantee: starting from any safe initialization, all intermediate policies remain safe given feasible projections. In constrained control settings with a stabilizing backup policy, our approach further ensures closed-loop stability and enables safe adaptation beyond the conservative backup. On regression with harmful supervision and a constrained double-integrator task with malicious expert, our approach consistently rejects unsafe updates, maintains feasibility throughout training, and achieves meaningful primal objective improvement.
Similar Papers
Proactive Constrained Policy Optimization with Preemptive Penalty
Machine Learning (CS)
Teaches robots to learn safely without breaking rules.
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
Machine Learning (CS)
Keeps AI smart while making it safe.
Multi-Objective Reward and Preference Optimization: Theory and Algorithms
Machine Learning (CS)
Teaches computers to make safe, smart choices.