The Digital Sous Chef -- A Comparative Study on Fine-Tuning Language Models for Recipe Generation
By: Shubham Pundhir, Ganesh Bagler
Potential Business Impact:
Makes computers write better recipes with exact amounts.
We established a rigorous benchmark for text-based recipe generation, a fundamental task in natural language generation. We present a comprehensive comparative study contrasting a fine-tuned GPT-2 large (774M) model against the GPT-2 small (124M) model and traditional LSTM/RNN baselines on the 5-cuisine corpus from RecipeDB. Our key contribution is a targeted tokenization strategy that augments the vocabulary with 23 common fraction tokens and custom structural markers. This approach addresses a critical limitation of generic tokenizers by preserving essential recipe structures and precise numerical quantities, thereby enhancing domain specificity. Performance is evaluated using a comprehensive suite of seven automatic metrics spanning fluency (BLEU-4, METEOR), coherence (ROUGE-L), semantic relevance (BERTScore), and diversity. Our experiments show that the large transformer-based approach yields a >20% relative improvement in BERTScore (F1) (0.92 vs 0.72) over the best recurrent baseline, while reducing perplexity by 69.8%. We conclude with a discussion of remaining challenges, particularly regarding factual accuracy, and outline how this foundational study paves the way for integrating real-world constraints and multi-modal inputs in advanced recipe generation research.
Similar Papers
Fine-tuning Language Models for Recipe Generation: A Comparative Analysis and Benchmark Study
Computation and Language
Makes computers write new, safe recipes.
Comparison of Large Language Models for Deployment Requirements
Computation and Language
Helps pick the best AI for your needs.
Evaluation of GPT-based large language generative AI models as study aids for the national licensure examination for registered dietitians in Japan
Computation and Language
AI helps nutrition students study for tests.