Score: 0

Beyond Shallow Heuristics: Leveraging Human Intuition for Curriculum Learning

Published: August 27, 2025 | arXiv ID: 2508.19873v1

By: Vanessa Toborek , Sebastian Müller , Tim Selbach and more

Potential Business Impact:

Teaches computers by showing them easy words first.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Curriculum learning (CL) aims to improve training by presenting data from "easy" to "hard", yet defining and measuring linguistic difficulty remains an open challenge. We investigate whether human-curated simple language can serve as an effective signal for CL. Using the article-level labels from the Simple Wikipedia corpus, we compare label-based curricula to competence-based strategies relying on shallow heuristics. Our experiments with a BERT-tiny model show that adding simple data alone yields no clear benefit. However, structuring it via a curriculum -- especially when introduced first -- consistently improves perplexity, particularly on simple language. In contrast, competence-based curricula lead to no consistent gains over random ordering, probably because they fail to effectively separate the two classes. Our results suggest that human intuition about linguistic difficulty can guide CL for language model pre-training.