Exploration of Summarization by Generative Language Models for Automated Scoring of Long Essays
By: Haowei Hua, Hong Jiao, Xinyi Wang
Potential Business Impact:
Scores long essays better by summarizing them.
BERT and its variants are extensively explored for automated scoring. However, a limit of 512 tokens for these encoder-based models showed the deficiency in automated scoring of long essays. Thus, this research explores generative language models for automated scoring of long essays via summarization and prompting. The results revealed great improvement of scoring accuracy with QWK increased from 0.822 to 0.8878 for the Learning Agency Lab Automated Essay Scoring 2.0 dataset.
Similar Papers
Long Context Automated Essay Scoring with Language Models
Computation and Language
Lets computers grade long essays completely.
Automated Refinement of Essay Scoring Rubrics for Language Models via Reflect-and-Revise
Computation and Language
Teaches computers to grade essays like humans.
Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring
Machine Learning (CS)
Helps computers grade essays more accurately.