Score: 0

Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations

Published: October 28, 2025 | arXiv ID: 2510.24250v1

By: Syed Zohaib Hassan , Pål Halvorsen , Miriam S. Johnson and more

Potential Business Impact:

Makes computers talk like young kids.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs), predominantly trained on adult conversational data, face significant challenges when generating authentic, child-like dialogue for specialized applications. We present a comparative study evaluating five different LLMs (GPT-4, RUTER-LLAMA-2-13b, GPTSW, NorMistral-7b, and NorBloom-7b) to generate age-appropriate Norwegian conversations for children aged 5 and 9 years. Through a blind evaluation by eleven education professionals using both real child interview data and LLM-generated text samples, we assessed authenticity and developmental appropriateness. Our results show that evaluators achieved strong inter-rater reliability (ICC=0.75) and demonstrated higher accuracy in age prediction for younger children (5-year-olds) compared to older children (9-year-olds). While GPT-4 and NorBloom-7b performed relatively well, most models generated language perceived as more linguistically advanced than the target age groups. These findings highlight critical data-related challenges in developing LLM systems for specialized applications involving children, particularly in low-resource languages where comprehensive age-appropriate lexical resources are scarce.

Large Language Models for Education and Research: An Empirical and User Survey-based Analysis

Artificial Intelligence

Helps students and researchers learn and solve problems.

8 Dec 2025 1

90%

Bridging the Early Science Gap with Artificial Intelligence: Evaluating Large Language Models as Tools for Early Childhood Science Education

Human-Computer Interaction

AI helps teachers explain science to young kids.

2 Jan 2025 0

90%

Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability

Computation and Language

Computers aren't getting more creative, even the best ones.

10 Apr 2025 1

View PDF Login to Bookmark

Page Count

20 pages

Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations

Makes computers talk like young kids.

Technical Abstract

Large Language Models for Education and Research: An Empirical and User Survey-based Analysis

Bridging the Early Science Gap with Artificial Intelligence: Evaluating Large Language Models as Tools for Early Childhood Science Education

Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability