Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies
By: Wenxi Li
Potential Business Impact:
Helps computers understand languages better.
Universal Dependencies (UD), while widely regarded as the most successful linguistic framework for cross-lingual syntactic representation, remains underexplored in terms of its effectiveness. This paper addresses this gap by integrating UD into pretrained language models and assesses if UD can improve their performance on a cross-lingual adversarial paraphrase identification task. Experimental results show that incorporation of UD yields significant improvements in accuracy and $F_1$ scores, with average gains of 3.85\% and 6.08\% respectively. These enhancements reduce the performance gap between pretrained models and large language models in some language pairs, and even outperform the latter in some others. Furthermore, the UD-based similarity score between a given language and English is positively correlated to the performance of models in that language. Both findings highlight the validity and potential of UD in out-of-domain tasks.
Similar Papers
UniBERT: Adversarial Training for Language-Universal Representations
Computation and Language
Helps computers understand many languages better.
Enhancing Coreference Resolution with Pretrained Language Models: Bridging the Gap Between Syntax and Semantics
Computation and Language
Helps computers understand who or what "they" refers to.
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Computation and Language
Helps computers understand sentences perfectly.