Learning to Lead: Incentivizing Strategic Agents in the Dark
By: Yuchen Wu, Xinyi Zhong, Zhuoran Yang
Potential Business Impact:
Helps bosses learn how workers cheat them.
We study an online learning version of the generalized principal-agent model, where a principal interacts repeatedly with a strategic agent possessing private types, private rewards, and taking unobservable actions. The agent is non-myopic, optimizing a discounted sum of future rewards and may strategically misreport types to manipulate the principal's learning. The principal, observing only her own realized rewards and the agent's reported types, aims to learn an optimal coordination mechanism that minimizes strategic regret. We develop the first provably sample-efficient algorithm for this challenging setting. Our approach features a novel pipeline that combines (i) a delaying mechanism to incentivize approximately myopic agent behavior, (ii) an innovative reward angle estimation framework that uses sector tests and a matching procedure to recover type-dependent reward functions, and (iii) a pessimistic-optimistic LinUCB algorithm that enables the principal to explore efficiently while respecting the agent's incentive constraints. We establish a near optimal $\tilde{O}(\sqrt{T}) $ regret bound for learning the principal's optimal policy, where $\tilde{O}(\cdot) $ omits logarithmic factors. Our results open up new avenues for designing robust online learning algorithms for a wide range of game-theoretic settings involving private types and strategic agents.
Similar Papers
Learning a Game by Paying the Agents
CS and Game Theory
Lets you guess what players want to win.
Desirable Effort Fairness and Optimality Trade-offs in Strategic Learning
CS and Game Theory
Helps AI make fair choices, even when people cheat.
Steering No-Regret Agents in MFGs under Model Uncertainty
Machine Learning (CS)
Guides many people to learn better, even when we don't know everything.