Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization
By: Joschka Braun, Carsten Eickhoff, Seyed Ali Bahrainian
Potential Business Impact:
Guides AI writing to be more helpful and safe.
Steering vectors are a lightweight method for controlling text properties by adding a learned bias to language model activations at inference time. So far, steering vectors have predominantly been evaluated in multiple-choice settings, while their effectiveness in free-form generation tasks remains understudied. Moving "Beyond Multiple Choice," we thoroughly evaluate the effectiveness of steering vectors in adaptively controlling topical focus, sentiment, toxicity, and readability in abstractive summaries of the NEWTS dataset. We find that steering effectively controls the targeted summary properties, but high steering strengths consistently degrade both intrinsic and extrinsic text quality. Compared to steering, prompting offers weaker control, while preserving text quality. Combining steering and prompting yields the strongest control over text properties and offers the most favorable efficacy-quality trade-off at moderate steering strengths. Our results underscore the practical trade-off between control strength and text quality preservation when applying steering vectors to free-form generation tasks.
Similar Papers
A Unified Understanding and Evaluation of Steering Methods
Machine Learning (CS)
Guides AI to write better without retraining.
On the Limitations of Steering in Language Model Alignment
Computation and Language
Makes AI follow instructions better, but not always.
Understanding (Un)Reliability of Steering Vectors in Language Models
Machine Learning (CS)
Makes AI follow instructions better, but sometimes it gets confused.