Masked Diffusion Language Models with Frequency-Informed Training
By: Despoina Kosmopoulou , Efthymios Georgiou , Vaggelis Dorovatas and more
Potential Business Impact:
Teaches computers language with less text.
We present a masked diffusion language modeling framework for data-efficient training for the BabyLM 2025 Challenge. Our approach applies diffusion training objectives to language modeling under strict data constraints, incorporating frequency-informed masking that prioritizes learning from rare tokens while maintaining theoretical validity. We explore multiple noise scheduling strategies, including two-mode approaches, and investigate different noise weighting schemes within the NELBO objective. We evaluate our method on the BabyLM benchmark suite, measuring linguistic competence, world knowledge, and human-likeness. Results show performance competitive to hybrid autoregressive-masked baselines, demonstrating that diffusion-based training offers a viable alternative for data-restricted language learning.
Similar Papers
Soft-Masked Diffusion Language Models
Machine Learning (CS)
Helps computers write better code, faster.
Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs
Computation and Language
Teaches computers to understand words better.
Learning Unmasking Policies for Diffusion Language Models
Machine Learning (CS)
Teaches computers to write better and faster.