Score: 1

Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays

Published: January 6, 2026 | arXiv ID: 2601.02659v2

By: Kuo Wang , Haowei Hua , Pengfei Yan and more

Potential Business Impact:

Helps computers grade long essays better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Long context may impose challenges for encoder-only language models in text processing, specifically for automated scoring of essays. This study trained several commonly used encoder-based language models for automated scoring of long essays. The performance of these trained models was evaluated and compared with the ensemble models built upon the base language models with a token limit of 512?. The experimented models include BERT-based models (BERT, RoBERTa, DistilBERT, and DeBERTa), ensemble models integrating embeddings from multiple encoder models, and ensemble models of feature-based supervised machine learning models, including Gradient-Boosted Decision Trees, eXtreme Gradient Boosting, and Light Gradient Boosting Machine. We trained, validated, and tested each model on a dataset of 17,307 essays, with an 80%/10%/10% split, and evaluated model performance using Quadratic Weighted Kappa. This study revealed that an ensemble-of-embeddings model that combines multiple pre-trained language model representations with gradient-boosting classifier as the ensemble model significantly outperforms individual language models at scoring long essays.

Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays

Computation and Language

Helps computers grade long essays better.

6 Jan 2026 1

92%

Exploration of Summarization by Generative Language Models for Automated Scoring of Long Essays

Computation and Language

Scores long essays better by summarizing them.

26 Oct 2025 0

92%

Long Context Automated Essay Scoring with Language Models

Computation and Language

Lets computers grade long essays completely.

12 Sep 2025 1

View PDF Login to Bookmark

Page Count

22 pages

Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays

Helps computers grade long essays better.

Technical Abstract

Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays

Exploration of Summarization by Generative Language Models for Automated Scoring of Long Essays

Long Context Automated Essay Scoring with Language Models