Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
By: Yunfan Zhang, Kathleen McKeown, Smaranda Muresan
Potential Business Impact:
Lets AI understand different opinions and viewpoints.
Large Language Models (LLMs) are typically trained to reflect a relatively uniform set of values, which limits their applicability to tasks that require understanding of nuanced human perspectives. Recent research has underscored the importance of enabling LLMs to support steerable pluralism -- the capacity to adopt a specific perspective and align generated outputs with it. In this work, we investigate whether Chain-of-Thought (CoT) reasoning techniques can be applied to building steerable pluralistic models. We explore several methods, including CoT prompting, fine-tuning on human-authored CoT, fine-tuning on synthetic explanations, and Reinforcement Learning with Verifiable Rewards (RLVR). We evaluate these approaches using the Value Kaleidoscope and OpinionQA datasets. Among the methods studied, RLVR consistently outperforms others and demonstrates strong training sample efficiency. We further analyze the generated CoT traces with respect to faithfulness and safety.
Similar Papers
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
Computation and Language
Helps computers explain why they pick answers.
Training Small Reasoning LLMs with Cognitive Preference Alignment
Computation and Language
Trains small AI to think better with less data.
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
CV and Pattern Recognition
Teaches computers to understand charts and tables better.