Score: 0

Reconstructing Item Characteristic Curves using Fine-Tuned Large Language Models

Published: January 5, 2026 | arXiv ID: 2601.02580v1

By: Christopher Ormerod

Potential Business Impact:

Makes tests better without needing real students.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Traditional methods for determining assessment item parameters, such as difficulty and discrimination, rely heavily on expensive field testing to collect student performance data for Item Response Theory (IRT) calibration. This study introduces a novel approach that implicitly models these psychometric properties by fine-tuning Large Language Models (LLMs) to simulate student responses across a spectrum of latent abilities. Leveraging the Qwen-3 dense model series and Low-Rank Adaptation (LoRA), we train models to generate responses to multiple choice questions conditioned on discrete ability descriptors. We reconstruct the probability of a correct response as a function of student ability, effectively generating synthetic Item Characteristic Curves (ICCs) to estimate IRT parameters. Evaluation on a dataset of Grade 6 English Language Arts (ELA) items and the BEA 2024 Shared Task dataset demonstrates that this method competes with or outperforms baseline approaches. This simulation-based technique seems particularly effective at modeling item discrimination.

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms

Computers and Society

Helps guess how hard school questions are.

9 Apr 2025 0

90%

Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension?

Computation and Language

Makes AI tutors act like real students.

11 Jul 2025 2

90%

Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length

Methodology

Tests AI thinking speed and accuracy better.

7 Dec 2025 1

View PDF Login to Bookmark

Page Count

19 pages

Reconstructing Item Characteristic Curves using Fine-Tuned Large Language Models

Makes tests better without needing real students.

Technical Abstract

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms

Can LLMs Reliably Simulate Real Students' Abilities in Mathematics and Reading Comprehension?

Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length