Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese
By: Zilong Li, Jie Cao
Potential Business Impact:
Helps computers translate old texts by learning from ancient notes.
Ancient people translated classical Chinese into Japanese by annotating around each character. We abstract this process as sequence tagging tasks and fit them into modern language technologies. The research of this annotation and translation system is a facing low-resource problem. We release this problem by introducing a LLM-based annotation pipeline and construct a new dataset from digitalized open-source translation data. We show that under the low-resource setting, introducing auxiliary Chinese NLP tasks has a promoting effect on the training of sequence tagging tasks. We also evaluate the performance of large language models. They achieve high scores in direct machine translation, but they are confused when being asked to annotate characters. Our method could work as a supplement of LLMs.
Similar Papers
Evaluating Large Language Models as Expert Annotators
Computation and Language
Computers learn to label text like experts.
Cross-Lingual Transfer for Low-Resource Natural Language Processing
Computation and Language
Helps computers understand many languages, not just English.
Ground Truth Generation for Multilingual Historical NLP using LLMs
Computation and Language
Helps computers understand old books and writings.