Score: 1

The Human-AI Hybrid Delphi Model: A Structured Framework for Context-Rich, Expert Consensus in Complex Domains

Published: August 12, 2025 | arXiv ID: 2508.09349v1

By: Cathy Speed, Ahmed A. Metwally

BigTech Affiliations: Google

Potential Business Impact:

Helps experts agree faster on complex topics.

Expert consensus plays a critical role in domains where evidence is complex, conflicting, or insufficient for direct prescription. Traditional methods, such as Delphi studies, consensus conferences, and systematic guideline synthesis, offer structure but face limitations including high panel burden, interpretive oversimplification, and suppression of conditional nuance. These challenges are now exacerbated by information overload, fragmentation of the evidence base, and increasing reliance on publicly available sources that lack expert filtering. This study introduces and evaluates a Human-AI Hybrid Delphi (HAH-Delphi) framework designed to augment expert consensus development by integrating a generative AI model (Gemini 2.5 Pro), small panels of senior human experts, and structured facilitation. The HAH-Delphi was tested in three phases: retrospective replication, prospective comparison, and applied deployment in two applied domains (endurance training and resistance and mixed cardio/strength training). The AI replicated 95% of published expert consensus conclusions in Phase I and showed 95% directional agreement with senior human experts in Phase II, though it lacked experiential and pragmatic nuance. In Phase III, compact panels of six senior experts achieved >90% consensus coverage and reached thematic saturation before the final participant. The AI provided consistent, literature-grounded scaffolding that supported divergence resolution and accelerated saturation. The HAH-Delphi framework offers a flexible, scalable approach for generating high-quality, context-sensitive consensus. Its successful application across health, coaching, and performance science confirms its methodological robustness and supports its use as a foundation for generating conditional, personalised guidance and published consensus frameworks at scale.

Country of Origin
🇺🇸 United States

Page Count
38 pages

Category
Computer Science:
Computation and Language