MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences
By: Zizhen Li , Chuanhao Li , Yibin Wang and more
Recent advancements have expanded the role of Large Language Models in board games from playing agents to creative co-designers. However, a critical gap remains: current systems lack the capacity to offer constructive critique grounded in the emergent user experience. Bridging this gap is fundamental for harmonizing Human-AI collaboration, as it empowers designers to refine their creations via external perspectives while steering models away from biased or unpredictable outcomes. Automating critique for board games presents two challenges: inferring the latent dynamics connecting rules to gameplay without an explicit engine, and modeling the subjective heterogeneity of diverse player groups. To address these, we curate a dataset of 1,727 structurally corrected rulebooks and 150K reviews selected via quality scoring and facet-aware sampling. We augment this data with Mechanics-Dynamics-Aesthetics (MDA) reasoning to explicitly bridge the causal gap between written rules and player experience. We further distill player personas and introduce MeepleLM, a specialized model that internalizes persona-specific reasoning patterns to accurately simulate the subjective feedback of diverse player archetypes. Experiments demonstrate that MeepleLM significantly outperforms latest commercial models (e.g., GPT-5.1, Gemini3-Pro) in community alignment and critique quality, achieving a 70% preference rate in user studies assessing utility. MeepleLM serves as a reliable virtual playtester for general interactive systems, marking a pivotal step towards audience-aligned, experience-aware Human-AI collaboration.
Similar Papers
Large language models replicate and predict human cooperation across experiments in game theory
Artificial Intelligence
Makes computers act like people making choices.
Social Simulations with Large Language Model Risk Utopian Illusion
Computation and Language
Computers show fake, too-nice people in chats.
LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
Computation and Language
Lets AI judge other AI's answers fairly.