Enhancing Arabic Automated Essay Scoring with Synthetic Data and Error Injection
By: Chatrine Qwaider , Bashar Alhafni , Kirill Chirkunov and more
Potential Business Impact:
Teaches computers to grade Arabic essays better.
Automated Essay Scoring (AES) plays a crucial role in assessing language learners' writing quality, reducing grading workload, and providing real-time feedback. The lack of annotated essay datasets inhibits the development of Arabic AES systems. This paper leverages Large Language Models (LLMs) and Transformer models to generate synthetic Arabic essays for AES. We prompt an LLM to generate essays across the Common European Framework of Reference (CEFR) proficiency levels and introduce and compare two approaches to error injection. We create a dataset of 3,040 annotated essays with errors injected using our two methods. Additionally, we develop a BERT-based Arabic AES system calibrated to CEFR levels. Our experimental results demonstrate the effectiveness of our synthetic dataset in improving Arabic AES performance. We make our code and data publicly available.
Similar Papers
Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems
Computation and Language
Makes essay grading smarter by finding mistakes.
How well can LLMs Grade Essays in Arabic?
Computation and Language
Helps computers grade Arabic essays better.
Automatic Essay Scoring and Feedback Generation in Basque Language Learning
Computation and Language
Helps computers grade essays and give feedback.