Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
By: Zixian Yang, Sushil Mahavir Varma, Lei Ying
Potential Business Impact:
Smarter system matches people and jobs faster.
We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $\tilde{O}(T^{1-\gamma})$ regret, $\tilde{O}(T^{\gamma/2})$ average queue length, and $\tilde{O}(T^{\gamma})$ maximum queue length for $\gamma \in (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $\gamma$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.
Similar Papers
Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond
CS and Game Theory
Helps computers learn best prices to sell things.
Online Learning of Optimal Sequential Testing Policies
Machine Learning (CS)
Helps pick the best tests for people faster.
Latency and Ordering Effects in Online Decisions
Machine Learning (CS)
Improves computer decisions with slow, mixed-up information.