Balancing Semantic Relevance and Engagement in Related Video Recommendations
By: Amit Jaspal , Feng Zhang , Wei Chang and more
Potential Business Impact:
Shows you videos that are actually about what you like.
Related video recommendations commonly use collaborative filtering (CF) driven by co-engagement signals, often resulting in recommendations lacking semantic coherence and exhibiting strong popularity bias. This paper introduces a novel multi-objective retrieval framework, enhancing standard two-tower models to explicitly balance semantic relevance and user engagement. Our approach uniquely combines: (a) multi-task learning (MTL) to jointly optimize co-engagement and semantic relevance, explicitly prioritizing topical coherence; (b) fusion of multimodal content features (textual and visual embeddings) for richer semantic understanding; and (c) off-policy correction (OPC) via inverse propensity weighting to effectively mitigate popularity bias. Evaluation on industrial-scale data and a two-week live A/B test reveals our framework's efficacy. We observed significant improvements in semantic relevance (from 51% to 63% topic match rate), a reduction in popular item distribution (-13.8% popular video recommendations), and a +0.04% improvement in our topline user engagement metric. Our method successfully achieves better semantic coherence, balanced engagement, and practical scalability for real-world deployment.
Similar Papers
Semantic Item Graph Enhancement for Multimodal Recommendation
Information Retrieval
Helps online stores show you better stuff.
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Information Retrieval
Helps video apps understand what you *really* like.
Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion
CV and Pattern Recognition
Finds specific video moments using smart searching.