TrackRec: Iterative Alternating Feedback with Chain-of-Thought via Preference Alignment for Recommendation
By: Yu Xia , Rui Zhong , Zeyu Song and more
Potential Business Impact:
Helps computers guess what you want to buy.
The extensive world knowledge and powerful reasoning capabilities of large language models (LLMs) have attracted significant attention in recommendation systems (RS). Specifically, The chain of thought (CoT) has been shown to improve the performance of LLMs on complex reasoning tasks for RS. However, due to the fact that LLMs often suffer from hallucination issues, there is no guarantee that their reasoning CoT is effective. A key challenge is to further enhance the recommendation capabilities of LLMs through effective CoT reasonings. Therefore, we propose \textbf{TrackRec}, a framework designed to enhance reasoning capabilities of LLMs for RS. TrackRec specifically focuses on accurately inferring recommendation CoT \textbf{(RecCoT)} for user preference using the knowledge from LLMs. This RecCoT can serve both as an explanation for the LLM's completion of recommendation tasks and as auxiliary features to assist recommendation models in accomplishing recommendation tasks. TrackRec consists of a RecCoT generator $(G)$ and a RecCoT validator $(V)$. Furthermore, we design alternating feedback learning mechanism that $G$ undergoes direct preference optimization via feedback from $V$ to produce increasingly accurate RecCoT aligned with $V$'s standards. Meanwhile, $V$ is fine-tuned using the inference feedback from $G$ to enhance its validation capabilities in alignment with recommendation tasks. Through iterative alternating feedback learning between $G$ and $V$, TrackRec continuously improves the user preference analysis capability of $G$ and the validation capacity of $V$. Extensive experiments demonstrate the effectiveness of our approach, showing that it surpasses state-of-the-art methods. Moreover, TrackRec has been deployed on a lagre advertising platform with hundreds of millions of users, achieving substantial gains.
Similar Papers
OneRec-Think: In-Text Reasoning for Generative Recommendation
Information Retrieval
Helps apps understand you better to keep you engaged.
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
Machine Learning (CS)
Makes smart computers think smarter, faster, and cheaper.
Latent Chain-of-Thought for Visual Reasoning
Artificial Intelligence
Makes AI think step-by-step better for new problems.