Score: 0

Measuring the Hidden Cost of Data Valuation through Collective Disclosure

Published: October 9, 2025 | arXiv ID: 2510.08869v2

By: Patrick Mesana, Gilles Caporossi, Sebastien Gambs

Potential Business Impact:

Helps fairly pay people for their data.

Business Areas:

Data Mining Data and Analytics, Information Technology

Data valuation methods assign marginal utility to each data point that has contributed to the training of a machine learning model. If used directly as a payout mechanism, this creates a hidden cost of valuation, in which contributors with near-zero marginal value would receive nothing, even though their data had to be collected and assessed. To better formalize this cost, we introduce a conceptual and game-theoretic model, the Information Disclosure Game, between a Data Union (sometimes also called a data trust), a member-run agent representing contributors, and a Data Consumer (e.g., a platform). After first aggregating members' data, the DU releases information progressively by adding Laplacian noise under a differentially-private mechanism. Through simulations with strategies guided by data Shapley values and multi-armed bandit exploration, we demonstrate on a Yelp review helpfulness prediction task that data valuation inherently incurs an explicit acquisition cost and that the DU's collective disclosure policy changes how this cost is distributed across members.