Score: 1

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Published: December 21, 2025 | arXiv ID: 2512.18880v1

By: Ming Li , Han Chen , Yunze Xiao and more

Potential Business Impact:

AI can't tell how hard questions are for people.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Accurate estimation of item (question or task) difficulty is critical for educational assessment but suffers from the cold start problem. While Large Language Models demonstrate superhuman problem-solving capabilities, it remains an open question whether they can perceive the cognitive struggles of human learners. In this work, we present a large-scale empirical analysis of Human-AI Difficulty Alignment for over 20 models across diverse domains such as medical knowledge and mathematical reasoning. Our findings reveal a systematic misalignment where scaling up model size is not reliably helpful; instead of aligning with humans, models converge toward a shared machine consensus. We observe that high performance often impedes accurate difficulty estimation, as models struggle to simulate the capability limitations of students even when being explicitly prompted to adopt specific proficiency levels. Furthermore, we identify a critical lack of introspection, as models fail to predict their own limitations. These results suggest that general problem-solving capability does not imply an understanding of human cognitive struggles, highlighting the challenge of using current models for automated difficulty prediction.

Estimating problem difficulty without ground truth using Large Language Model comparisons

Machine Learning (CS)

Helps AI learn harder problems by guessing difficulty.

16 Dec 2025 1

90%

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms

Computers and Society

Helps guess how hard school questions are.

9 Apr 2025 0

90%

LLMs Encode How Difficult Problems Are

Computation and Language

Makes computers understand hard problems better.

20 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

27 pages

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

AI can't tell how hard questions are for people.

Technical Abstract

Estimating problem difficulty without ground truth using Large Language Model comparisons

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms

LLMs Encode How Difficult Problems Are