Score: 1

CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning

Published: October 21, 2025 | arXiv ID: 2510.18466v1

By: Masato Kikuchi , Masatsugu Ono , Toshioki Soga and more

Potential Business Impact:

Helps language learners by sorting words by difficulty.

Business Areas:
Semantic Web Internet Services

Although WordNet is a valuable resource owing to its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this, we developed a WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our method, we constructed a large-scale corpus containing both sense and CEFR-level information from our annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on our corpus perform comparably to those trained on gold-standard annotations. Furthermore, by combining our corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81, indicating the high accuracy of our annotations. Our annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.

Country of Origin
🇯🇵 Japan

Repos / Data Links

Page Count
15 pages

Category
Computer Science:
Computation and Language