A Sustainable AI Economy Needs Data Deals That Work for Generators
By: Ruoxi Jia , Luis Oala , Wenjie Xiong and more
We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.
Similar Papers
The Economics of AI Training Data: A Research Agenda
Computers and Society
Helps make fair prices for computer learning data.
Fairshare Data Pricing via Data Valuation for Large Language Models
CS and Game Theory
Fair pay for data makes AI smarter.
The Cost of Balanced Training-Data Production in an Online Data Market
CS and Game Theory
Makes "ethical" data markets work for AI.