Score: 2

STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models

Published: November 5, 2025 | arXiv ID: 2511.03827v1

By: Mohammad Atif Quamar , Mohammad Areeb , Mikhail Kuznetsov and more

BigTech Affiliations: Amazon

Potential Business Impact:

Makes AI understand what people want better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Aligning large language models with human values is crucial for their safe deployment; however, existing methods, such as fine-tuning, are computationally expensive and suboptimal. In contrast, inference-time approaches like Best-of-N sampling require practically infeasible computation to achieve optimal alignment. We propose STARS: Segment-level Token Alignment with Rejection Sampling, a decoding-time algorithm that steers model generation by iteratively sampling, scoring, and rejecting/accepting short, fixed-size token segments. This allows for early correction of the generation path, significantly improving computational efficiency and boosting alignment quality. Across a suite of six LLMs, we show that STARS outperforms Supervised Fine-Tuning (SFT) by up to 14.9 percentage points and Direct Preference Optimization (DPO) by up to 4.3 percentage points on win-rates, while remaining highly competitive with strong Best-of-N baselines. Our work establishes granular, reward-guided sampling as a generalizable, robust, and efficient alternative to traditional fine-tuning and full-sequence ranking methods for aligning LLMs.

STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale

Information Retrieval

Shows you better stuff to buy online.

10 Dec 2025 1

87%

Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner

Computation and Language

Makes AI safer and smarter, faster and cheaper.

20 Aug 2025 1

87%

Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner

Computation and Language

Makes AI smarter and safer without slowing it down.

20 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

15 pages

STARS: Segment-level Token Alignment with Rejection Sampling in Large Language Models

Makes AI understand what people want better.

Technical Abstract

STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale

Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner

Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner