Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models
By: Amartya Roy , Elamparithy M , Kripabandhu Ghosh and more
Potential Business Impact:
Helps computers reason better, especially without words.
In context learning (ICL) underpins recent advances in large language models (LLMs), although its role and performance in causal reasoning remains unclear. Causal reasoning demands multihop composition and strict conjunctive control, and reliance on spurious lexical relations of the input could provide misleading results. We hypothesize that, due to their ability to project the input into a latent space, encoder and encoder decoder architectures are better suited for said multihop conjunctive reasoning versus decoder only models. To do this, we compare fine-tuned versions of all the aforementioned architectures with zero and few shot ICL in both natural language and non natural language scenarios. We find that ICL alone is insufficient for reliable causal reasoning, often overfocusing on irrelevant input features. In particular, decoder only models are noticeably brittle to distributional shifts, while finetuned encoder and encoder decoder models can generalize more robustly across our tests, including the non natural language split. Both architectures are only matched or surpassed by decoder only architectures at large scales. We conclude by noting that for cost effective, short horizon robust causal reasoning, encoder or encoder decoder architectures with targeted finetuning are preferable.
Similar Papers
Mitigating Hallucinations in Large Language Models via Causal Reasoning
Computation and Language
Teaches computers to think logically, reducing fake answers.
Unsupervised decoding of encoded reasoning using language model interpretability
Artificial Intelligence
Uncovers how AI thinks, even when hidden.
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking
Artificial Intelligence
Makes smart computers think better, even when complex.