Score: 0

Deep Reinforcement Learning for Ranking Utility Tuning in the Ad Recommender System at Pinterest

Published: September 5, 2025 | arXiv ID: 2509.05292v1

By: Xiao Yang , Mehdi Ben Ayed , Longyu Zhao and more

Potential Business Impact:

Makes online ads show better for you.

Business Areas:

Personalization Commerce and Shopping

The ranking utility function in an ad recommender system, which linearly combines predictions of various business goals, plays a central role in balancing values across the platform, advertisers, and users. Traditional manual tuning, while offering simplicity and interpretability, often yields suboptimal results due to its unprincipled tuning objectives, the vast amount of parameter combinations, and its lack of personalization and adaptability to seasonality. In this work, we propose a general Deep Reinforcement Learning framework for Personalized Utility Tuning (DRL-PUT) to address the challenges of multi-objective optimization within ad recommender systems. Our key contributions include: 1) Formulating the problem as a reinforcement learning task: given the state of an ad request, we predict the optimal hyperparameters to maximize a pre-defined reward. 2) Developing an approach to directly learn an optimal policy model using online serving logs, avoiding the need to estimate a value function, which is inherently challenging due to the high variance and unbalanced distribution of immediate rewards. We evaluated DRL-PUT through an online A/B experiment in Pinterest's ad recommender system. Compared to the baseline manual utility tuning approach, DRL-PUT improved the click-through rate by 9.7% and the long click-through rate by 7.7% on the treated segment. We conducted a detailed ablation study on the impact of different reward definitions and analyzed the personalization aspect of the learned policy model.

RewardRank: Optimizing True Learning-to-Rank Utility

Information Retrieval

Shows online stores what shoppers really want.

19 Aug 2025 1

87%

Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

Information Retrieval

Makes movie suggestions better and faster.

10 Nov 2025 1

86%

Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards

Machine Learning (CS)

Helps ads reach the right people for more money.

22 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

10 pages

Deep Reinforcement Learning for Ranking Utility Tuning in the Ad Recommender System at Pinterest

Makes online ads show better for you.

Technical Abstract

RewardRank: Optimizing True Learning-to-Rank Utility

Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function Optimization

Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards