RankPO: Preference Optimization for Job-Talent Matching
By: Yafei Zhang , Murray Wang , Yu Wang and more
Potential Business Impact:
Helps find the right person for a job.
Matching job descriptions (JDs) with suitable talent requires models capable of understanding not only textual similarities between JDs and candidate resumes but also contextual factors such as geographical location and academic seniority. To address this challenge, we propose a two-stage training framework for large language models (LLMs). In the first stage, a contrastive learning approach is used to train the model on a dataset constructed from real-world matching rules, such as geographical alignment and research area overlap. While effective, this model primarily learns patterns that defined by the matching rules. In the second stage, we introduce a novel preference-based fine-tuning method inspired by Direct Preference Optimization (DPO), termed Rank Preference Optimization (RankPO), to align the model with AI-curated pairwise preferences emphasizing textual understanding. Our experiments show that while the first-stage model achieves strong performance on rule-based data (nDCG@20 = 0.706), it lacks robust textual understanding (alignment with AI annotations = 0.46). By fine-tuning with RankPO, we achieve a balanced model that retains relatively good performance in the original tasks while significantly improving the alignment with AI preferences. The code and data are available at https://github.com/yflyzhang/RankPO.
Similar Papers
In-context Ranking Preference Optimization
Machine Learning (CS)
Helps computers learn to rank answers better.
K-order Ranking Preference Optimization for Large Language Models
Information Retrieval
Helps computers rank search results better.
BPO: Revisiting Preference Modeling in Direct Preference Optimization
Computation and Language
Makes AI better at math and following instructions.