Language models can learn implicit multi-hop reasoning, but only if they have lots of training data
By: Yuekun Yao , Yupei Du , Dawei Zhu and more
Potential Business Impact:
Computers learn to solve hard problems faster.
Implicit reasoning is the ability of a language model to solve multi-hop reasoning tasks in a single forward pass, without chain of thought. We investigate this capability using GPT2-style language models trained from scratch on controlled $k$-hop reasoning datasets ($k = 2, 3, 4$). We show that while such models can indeed learn implicit $k$-hop reasoning, the required training data grows exponentially in $k$, and the required number of transformer layers grows linearly in $k$. We offer a theoretical explanation for why this depth growth is necessary. We further find that the data requirement can be mitigated, but not eliminated, through curriculum learning.
Similar Papers
How does Transformer Learn Implicit Reasoning?
Machine Learning (CS)
Teaches computers to think step-by-step.
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Computation and Language
Teaches computers to solve problems by copying patterns.
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Computation and Language
Makes computers solve hard problems with fewer parts.