Ensemble Debates with Local Large Language Models for AI Alignment
By: Ephraiem Sarabamoun
Potential Business Impact:
Helps AI understand what people want better.
As large language models (LLMs) take on greater roles in high-stakes decisions, alignment with human values is essential. Reliance on proprietary APIs limits reproducibility and broad participation. We study whether local open-source ensemble debates can improve alignmentoriented reasoning. Across 150 debates spanning 15 scenarios and five ensemble configurations, ensembles outperform single-model baselines on a 7-point rubric (overall: 3.48 vs. 3.13), with the largest gains in reasoning depth (+19.4%) and argument quality (+34.1%). Improvements are strongest for truthfulness (+1.25 points) and human enhancement (+0.80). We provide code, prompts, and a debate data set, providing an accessible and reproducible foundation for ensemble-based alignment evaluation.
Similar Papers
DEBATE: A Large-Scale Benchmark for Role-Playing LLM Agents in Multi-Agent, Long-Form Debates
Computation and Language
Helps computers learn how people change their minds.
To Mask or to Mirror: Human-AI Alignment in Collective Reasoning
Artificial Intelligence
AI groups copy or fix human group biases.
AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs
Computation and Language
AI copies judges, not its own ideas.