Tied Pools and Drawn Games
By: Roderick Edwards
Potential Business Impact:
Figures out who's best when games can be draws.
We consider the problem of estimating `preference' or `strength' parameters in three-way comparison experiments, each composed of a series of paired comparisons, but where only the single `preferred' or `strongest' candidate is known in each trial. Such experiments arise in psychology and market research, but here we use chess competitions as the prototypical context, in particular a series of `pools' between three players that occurred in 1821. The possibilities of tied pools, redundant and therefore unplayed games, and drawn games must all be considered. This leads us to reconsider previous models for estimating strength parameters when drawn games are a possible result. In particular, Davidson's method for ties has been questioned, and we propose an alternative. We argue that the most correct use of this method is to estimate strength parameters first, and then fix these to estimate a draw-propensity parameter, rather than estimating all parameters simultaneously, as Davidson does. This results in a model that is consistent with, and provides more context for, a simple method for handling draws proposed by Glickman. Finally, in pools with incomplete information, the number of drawn games can be estimated by adopting a draw-propensity parameter from related data with more complete information.
Similar Papers
Rating competitors in games with strength-dependent tie probabilities
Methodology
Makes game ratings fairer by counting ties.
Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation
Computation and Language
Makes AI rating systems fairer by understanding "draws."
Policies of Multiple Skill Levels for Better Strength Estimation in Games
Machine Learning (CS)
Helps computers guess player skill better.