Logic-based Task Representation and Reward Shaping in Multiagent Reinforcement Learning
By: Nishant Doshi
Potential Business Impact:
Teaches robots to work together faster.
This paper presents an approach for accelerated learning of optimal plans for a given task represented using Linear Temporal Logic (LTL) in multi-agent systems. Given a set of options (temporally abstract actions) available to each agent, we convert the task specification into the corresponding Buchi Automaton and proceed with a model-free approach which collects transition samples and constructs a product Semi Markov Decision Process (SMDP) on-the-fly. Value-based Reinforcement Learning algorithms can then be used to synthesize a correct-by-design controller without learning the underlying transition model of the multi-agent system. The exponential sample complexity due to multiple agents is dealt with using a novel reward shaping approach. We test the proposed algorithm in a deterministic gridworld simulation for different tasks and find that the reward shaping results in significant reduction in convergence times. We also infer that using options becomes increasing more relevant as the state and action space increases in multi-agent systems.
Similar Papers
Zero-Shot Instruction Following in RL via Structured LTL Representations
Artificial Intelligence
Teaches robots to follow complex, multi-step instructions.
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
Machine Learning (CS)
Teaches robots to follow rules safely and fast.
Motion Planning Under Temporal Logic Specifications In Semantically Unknown Environments
Robotics
Helps robots navigate unknown places to do jobs.