Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
By: Renpu Liu, Jing Yang
Large language models (LLMs) exhibit impressive in-context learning (ICL) capabilities, yet the quality of their predictions is fundamentally limited by the few costly labeled demonstrations that can fit into a prompt. Meanwhile, there exist vast and continuously growing amounts of unlabeled data that may be closely related to the ICL task. How to utilize such unlabeled data to provably enhance the performance of ICL thus becomes an emerging fundamental question. In this work, we propose a novel augmented ICL framework, in which the prompt includes a small set of labeled examples alongside a block of unlabeled inputs. We focus on the multi-class linear classification setting and demonstrate that, with chain-of-thought (CoT) prompting, a multi-layer transformer can effectively emulate an expectation-maximization (EM) algorithm. This enables the transformer to implicitly extract useful information from both labeled and unlabeled data, leading to provable improvements in ICL accuracy. Moreover, we show that such a transformer can be trained via teacher forcing, with its parameters converging to the desired solution at a linear rate. Experiments demonstrate that the augmented ICL framework consistently outperforms conventional few-shot ICL, providing empirical support for our theoretical findings. To the best of our knowledge, this is the first theoretical study on the impact of unlabeled data on the ICL performance of transformers.
Similar Papers
In-Context Semi-Supervised Learning
Machine Learning (CS)
Helps computers learn from less labeled examples.
When and How Unlabeled Data Provably Improve In-Context Learning
Machine Learning (CS)
Computers learn better with some wrong answers.
Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context
Machine Learning (CS)
AI learns tasks faster, but struggles with many.