CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning
By: Masato Kikuchi , Masatsugu Ono , Toshioki Soga and more
Potential Business Impact:
Helps language learners by sorting words by difficulty.
Although WordNet is a valuable resource owing to its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this, we developed a WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our method, we constructed a large-scale corpus containing both sense and CEFR-level information from our annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on our corpus perform comparably to those trained on gold-standard annotations. Furthermore, by combining our corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81, indicating the high accuracy of our annotations. Our annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.
Similar Papers
Classifying German Language Proficiency Levels Using Large Language Models
Computation and Language
Helps teachers know how well students read German.
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment
Computation and Language
Helps computers judge how hard a text is.
Ace-CEFR -- A Dataset for Automated Evaluation of the Linguistic Difficulty of Conversational Texts for LLM Applications
Computation and Language
Helps computers understand easy and hard writing.