Conscious Data Contribution via Community-Driven Chain-of-Thought Distillation
By: Lena Libon , Meghana Bhange , Rushabh Solanki and more
The current era of AI development places a heavy emphasis on training large models on increasingly scaled-up datasets. This paradigm has catalyzed entirely new product categories, such as LLM chatbots, while also raising concerns about data privacy and consumer choice. In this paper, we consider questions of data portability and user autonomy in the context of LLMs that "reason" using chain-of-thought (CoT) traces, computing intermediate text artifacts from user input before producing a final output. We first interpret recent data privacy and portability law to argue that these intermediate computations qualify as users' personal data. Then, building on the existing framework of Conscious Data Contribution, we show how communities who receive low utility from an available model can aggregate and distill their shared knowledge into an alternate model better aligned with their goals. We verify this approach empirically and investigate the effects of community diversity, reasoning granularity, and community size on distillation performance.
Similar Papers
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Artificial Intelligence
Computers' "thinking" breaks when problems change.
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Computation and Language
Helps AI "think step-by-step" to solve harder problems.
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
Computation and Language
Helps AI "think" step-by-step to solve harder problems.