Measuring the Hidden Cost of Data Valuation through Collective Disclosure
By: Patrick Mesana, Gilles Caporossi, Sebastien Gambs
Potential Business Impact:
Helps fairly pay people for their data.
Data valuation methods assign marginal utility to each data point that has contributed to the training of a machine learning model. If used directly as a payout mechanism, this creates a hidden cost of valuation, in which contributors with near-zero marginal value would receive nothing, even though their data had to be collected and assessed. To better formalize this cost, we introduce a conceptual and game-theoretic model, the Information Disclosure Game, between a Data Union (sometimes also called a data trust), a member-run agent representing contributors, and a Data Consumer (e.g., a platform). After first aggregating members' data, the DU releases information progressively by adding Laplacian noise under a differentially-private mechanism. Through simulations with strategies guided by data Shapley values and multi-armed bandit exploration, we demonstrate on a Yelp review helpfulness prediction task that data valuation inherently incurs an explicit acquisition cost and that the DU's collective disclosure policy changes how this cost is distributed across members.
Similar Papers
Measuring the Hidden Cost of Data Valuation through Collective Disclosure
CS and Game Theory
Fairly pays everyone for their data.
From Fairness to Truthfulness: Rethinking Data Valuation Design
CS and Game Theory
Pays people fairly for data used by AI.
Designing DSIC Mechanisms for Data Sharing in the Era of Large Language Models
CS and Game Theory
Gets better AI by fairly paying for good data.