Score: 0

Measuring the Hidden Cost of Data Valuation through Collective Disclosure

Published: October 9, 2025 | arXiv ID: 2510.08869v2

By: Patrick Mesana, Gilles Caporossi, Sebastien Gambs

Potential Business Impact:

Helps fairly pay people for their data.

Business Areas:
Data Mining Data and Analytics, Information Technology

Data valuation methods assign marginal utility to each data point that has contributed to the training of a machine learning model. If used directly as a payout mechanism, this creates a hidden cost of valuation, in which contributors with near-zero marginal value would receive nothing, even though their data had to be collected and assessed. To better formalize this cost, we introduce a conceptual and game-theoretic model, the Information Disclosure Game, between a Data Union (sometimes also called a data trust), a member-run agent representing contributors, and a Data Consumer (e.g., a platform). After first aggregating members' data, the DU releases information progressively by adding Laplacian noise under a differentially-private mechanism. Through simulations with strategies guided by data Shapley values and multi-armed bandit exploration, we demonstrate on a Yelp review helpfulness prediction task that data valuation inherently incurs an explicit acquisition cost and that the DU's collective disclosure policy changes how this cost is distributed across members.

Country of Origin
🇨🇦 Canada

Page Count
12 pages

Category
Computer Science:
CS and Game Theory