Score: 0

Lexical Bundle Frequency as a Construct-Relevant Candidate Feature in Automated Scoring of L2 Academic Writing

Published: April 11, 2025 | arXiv ID: 2504.08537v1

By: Burak Senel

Potential Business Impact:

Helps computers grade writing more like teachers.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Automated scoring (AS) systems are increasingly used for evaluating L2 writing, but require ongoing refinement for construct validity. While prior work suggested lexical bundles (LBs) - recurrent multi-word sequences satisfying certain frequency criteria - could inform assessment, their empirical integration into AS models needs further investigation. This study tested the impact of incorporating LB frequency features into an AS model for TOEFL independent writing tasks. Analyzing a sampled subcorpus (N=1,225 essays, 9 L1s) from the TOEFL11 corpus, scored by ETS-trained raters (Low, Medium, High), 3- to 9-word LBs were extracted, distinguishing prompt-specific from non-prompt types. A baseline Support Vector Machine (SVM) scoring model using established linguistic features (e.g., mechanics, cohesion, sophistication) was compared against an extended model including three aggregate LB frequency features (total prompt, total non-prompt, overall total). Results revealed significant, though generally small-effect, relationships between LB frequency (especially non-prompt bundles) and proficiency (p < .05). Mean frequencies suggested lower proficiency essays used more LBs overall. Critically, the LB-enhanced model improved agreement with human raters (Quadratic Cohen's Kappa +2.05%, overall Cohen's Kappa +5.63%), with notable gains for low (+10.1% exact agreement) and medium (+14.3% Cohen's Kappa) proficiency essays. These findings demonstrate that integrating aggregate LB frequency offers potential for developing more linguistically informed and accurate AS systems, particularly for differentiating developing L2 writers.

Corpus Frequencies in Morphological Inflection: Do They Matter?

Computation and Language

Helps computers learn word forms better from real text.

27 Oct 2025 0

85%

Assessing the validity of new paradigmatic complexity measures as criterial features for proficiency in L2 writings in English

Computation and Language

Helps computers grade student writing better.

13 Mar 2025 0

85%

Advancing Automated Speaking Assessment Leveraging Multifaceted Relevance and Grammar Information

Computation and Language

Helps computers judge speaking better by checking words and grammar.

19 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

13 pages

Lexical Bundle Frequency as a Construct-Relevant Candidate Feature in Automated Scoring of L2 Academic Writing

Helps computers grade writing more like teachers.

Technical Abstract

Corpus Frequencies in Morphological Inflection: Do They Matter?

Assessing the validity of new paradigmatic complexity measures as criterial features for proficiency in L2 writings in English

Advancing Automated Speaking Assessment Leveraging Multifaceted Relevance and Grammar Information