Are LLMs The Way Forward? A Case Study on LLM-Guided Reinforcement Learning for Decentralized Autonomous Driving
By: Timur Anvar , Jeffrey Chen , Yuyan Wang and more
Potential Business Impact:
Helps self-driving cars learn safer highway driving.
Autonomous vehicle navigation in complex environments such as dense and fast-moving highways and merging scenarios remains an active area of research. A key limitation of RL is its reliance on well-specified reward functions, which often fail to capture the full semantic and social complexity of diverse, out-of-distribution situations. As a result, a rapidly growing line of research explores using Large Language Models (LLMs) to replace or supplement RL for direct planning and control, on account of their ability to reason about rich semantic context. However, LLMs present significant drawbacks: they can be unstable in zero-shot safety-critical settings, produce inconsistent outputs, and often depend on expensive API calls with network latency. This motivates our investigation into whether small, locally deployed LLMs (< 14B parameters) can meaningfully support autonomous highway driving through reward shaping rather than direct control. We present a case study comparing RL-only, LLM-only, and hybrid approaches, where LLMs augment RL rewards by scoring state-action transitions during training, while standard RL policies execute at test time. Our findings reveal that RL-only agents achieve moderate success rates (73-89%) with reasonable efficiency, LLM-only agents can reach higher success rates (up to 94%) but with severely degraded speed performance, and hybrid approaches consistently fall between these extremes. Critically, despite explicit efficiency instructions, LLM-influenced approaches exhibit systematic conservative bias with substantial model-dependent variability, highlighting important limitations of current small LLMs for safety-critical control tasks.
Similar Papers
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Computation and Language
Teaches AI to learn and solve problems better.
Guiding Exploration in Reinforcement Learning Through LLM-Augmented Observations
Machine Learning (CS)
Helps robots learn tasks faster using smart advice.
LLM-Guided Reinforcement Learning with Representative Agents for Traffic Modeling
CS and Game Theory
Makes traffic jams better by learning from groups.