Language Generation: Complexity Barriers and Implications for Learning
By: Marcelo Arenas , Pablo Barceló , Luis Cofré and more
Potential Business Impact:
Computers need many examples to learn language.
Kleinberg and Mullainathan showed that, in principle, language generation is always possible: with sufficiently many positive examples, a learner can eventually produce sentences indistinguishable from those of a target language. However, the existence of such a guarantee does not speak to its practical feasibility. In this work, we show that even for simple and well-studied language families -- such as regular and context-free languages -- the number of examples required for successful generation can be extraordinarily large, and in some cases not bounded by any computable function. These results reveal a substantial gap between theoretical possibility and efficient learnability. They suggest that explaining the empirical success of modern language models requires a refined perspective -- one that takes into account structural properties of natural language that make effective generation possible in practice.
Similar Papers
Are the LLMs Capable of Maintaining at Least the Language Genus?
Computation and Language
Computers understand languages better when they're related.
Language Generation in the Limit: Noise, Loss, and Feedback
Data Structures and Algorithms
Teaches computers to learn any language perfectly.
Language Generation with Infinite Contamination
Machine Learning (Stat)
Teaches computers to learn languages even with mistakes.