Score: 0

Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Published: March 18, 2025 | arXiv ID: 2503.14718v1

By: Hakyung Sung, Gyu-Ho Shin

Potential Business Impact:

Makes computers understand Korean learned by non-native speakers.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We expand the second language (L2) Korean Universal Dependencies (UD) treebank with 5,454 manually annotated sentences. The annotation guidelines are also revised to better align with the UD framework. Using this enhanced treebank, we fine-tune three Korean language models and evaluate their performance on in-domain and out-of-domain L2-Korean datasets. The results show that fine-tuning significantly improves their performance across various metrics, thus highlighting the importance of using well-tailored L2 datasets for fine-tuning first-language-based, general-purpose language models for the morphosyntactic analysis of L2 data.

UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Computation and Language

Helps computers understand Korean grammar better.

10 Jun 2025 0

87%

Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies

Computation and Language

Helps computers understand languages better.

5 Jun 2025 1

87%

Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs

Computation and Language

Helps computers understand sentences perfectly.

11 Jun 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

7 pages

Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement

Makes computers understand Korean learned by non-native speakers.

Technical Abstract

UD-KSL Treebank v1.3: A semi-automated framework for aligning XPOS-extracted units with UPOS tags

Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies

Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs