Generalized Linear Markov Decision Process
By: Sinian Zhang , Kaicheng Zhang , Ziping Xu and more
Potential Business Impact:
Helps computers learn with less reward information.
The linear Markov Decision Process (MDP) framework offers a principled foundation for reinforcement learning (RL) with strong theoretical guarantees and sample efficiency. However, its restrictive assumption-that both transition dynamics and reward functions are linear in the same feature space-limits its applicability in real-world domains, where rewards often exhibit nonlinear or discrete structures. Motivated by applications such as healthcare and e-commerce, where data is scarce and reward signals can be binary or count-valued, we propose the Generalized Linear MDP (GLMDP) framework-an extension of the linear MDP framework-that models rewards using generalized linear models (GLMs) while maintaining linear transition dynamics. We establish the Bellman completeness of GLMDPs with respect to a new function class that accommodates nonlinear rewards and develop two offline RL algorithms: Generalized Pessimistic Value Iteration (GPEVI) and a semi-supervised variant (SS-GPEVI) that utilizes both labeled and unlabeled trajectories. Our algorithms achieve theoretical guarantees on policy suboptimality and demonstrate improved sample efficiency in settings where reward labels are expensive or limited.
Similar Papers
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Artificial Intelligence
Teaches robots to learn from hidden rewards.
Improving Controller Generalization with Dimensionless Markov Decision Processes
Machine Learning (CS)
Makes robots learn in new places better.
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
Artificial Intelligence
Solves hard computer puzzles faster using a new math trick.