Score: 2

BoRP: Bootstrapped Regression Probing for Scalable and Human-Aligned LLM Evaluation

Published: January 26, 2026 | arXiv ID: 2601.18253v1

By: Peng Sun, Xiangyu Zhang, Duan Wu

BigTech Affiliations: Alibaba

Potential Business Impact:

Makes AI assistants understand if people are happy.

Business Areas:

A/B Testing Data and Analytics

Accurate evaluation of user satisfaction is critical for iterative development of conversational AI. However, for open-ended assistants, traditional A/B testing lacks reliable metrics: explicit feedback is sparse, while implicit metrics are ambiguous. To bridge this gap, we introduce BoRP (Bootstrapped Regression Probing), a scalable framework for high-fidelity satisfaction evaluation. Unlike generative approaches, BoRP leverages the geometric properties of LLM latent space. It employs a polarization-index-based bootstrapping mechanism to automate rubric generation and utilizes Partial Least Squares (PLS) to map hidden states to continuous scores. Experiments on industrial datasets show that BoRP (Qwen3-8B/14B) significantly outperforms generative baselines (even Qwen3-Max) in alignment with human judgments. Furthermore, BoRP reduces inference costs by orders of magnitude, enabling full-scale monitoring and highly sensitive A/B testing via CUPED.

Bootstrapping LLMs via Preference-Based Policy Optimization

Artificial Intelligence

Teaches AI to follow human wishes better.

17 Nov 2025 1

86%

IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization

Artificial Intelligence

Helps students learn better with smart lesson plans.

21 Jan 2026 0

86%

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

Machine Learning (CS)

Teaches robots to learn faster with words.

26 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

17 pages

BoRP: Bootstrapped Regression Probing for Scalable and Human-Aligned LLM Evaluation

Makes AI assistants understand if people are happy.

Technical Abstract

Bootstrapping LLMs via Preference-Based Policy Optimization

IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs