PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation
By: Yuanzhe Liu , Jingyuan Zhu , Yuchen Mo and more
Potential Business Impact:
Robots learn to do many steps without messing up.
Recent advancements in vision-language-action (VLA) models have shown promise in robotic manipulation, yet they continue to struggle with long-horizon, multi-step tasks. Existing methods lack internal reasoning mechanisms that can identify task-relevant interaction cues or track progress within a subtask, leading to critical execution errors such as repeated actions, missed steps, and premature termination. To address these challenges, we introduce PALM, a VLA framework that structures policy learning around interaction-centric affordance reasoning and subtask progress cues. PALM distills complementary affordance representations that capture object relevance, contact geometry, spatial placements, and motion dynamics, and serve as task-relevant anchors for visuomotor control. To further stabilize long-horizon execution, PALM predicts continuous within-subtask progress, enabling seamless subtask transitions. Across extensive simulation and real-world experiments, PALM consistently outperforms baselines, achieving a 91.8% success rate on LIBERO-LONG, a 12.5% improvement in average length on CALVIN ABC->D, and a 2x improvement over real-world baselines across three long-horizon generalization settings.
Similar Papers
Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation
Robotics
Robots learn to do many steps in a row.
LoLA: Long Horizon Latent Action Learning for General Robot Manipulation
Robotics
Helps robots learn long, complex tasks.
Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills
Robotics
Robots learn complex tasks without human help.