Score: 2

Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis

Published: December 26, 2025 | arXiv ID: 2512.22100v1

By: Duygu Altinok

Potential Business Impact:

Tests computer understanding of Turkish words.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Evaluating the performance of various model architectures, such as transformers, large language models (LLMs), and other NLP systems, requires comprehensive benchmarks that measure performance across multiple dimensions. Among these, the evaluation of natural language understanding (NLU) is particularly critical as it serves as a fundamental criterion for assessing model capabilities. Thus, it is essential to establish benchmarks that enable thorough evaluation and analysis of NLU abilities from diverse perspectives. While the GLUE benchmark has set a standard for evaluating English NLU, similar benchmarks have been developed for other languages, such as CLUE for Chinese, FLUE for French, and JGLUE for Japanese. However, no comparable benchmark currently exists for the Turkish language. To address this gap, we introduce TrGLUE, a comprehensive benchmark encompassing a variety of NLU tasks for Turkish. In addition, we present SentiTurca, a specialized benchmark for sentiment analysis. To support researchers, we also provide fine-tuning and evaluation code for transformer-based models, facilitating the effective use of these benchmarks. TrGLUE comprises Turkish-native corpora curated to mirror the domains and task formulations of GLUE-style evaluations, with labels obtained through a semi-automated pipeline that combines strong LLM-based annotation, cross-model agreement checks, and subsequent human validation. This design prioritizes linguistic naturalness, minimizes direct translation artifacts, and yields a scalable, reproducible workflow. With TrGLUE, our goal is to establish a robust evaluation framework for Turkish NLU, empower researchers with valuable resources, and provide insights into generating high-quality semi-automated datasets.

Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları

Computation and Language

Tests how well computers understand Turkish language.

18 Aug 2025 1

88%

Developing a Comprehensive Framework for Sentiment Analysis in Turkish

Computation and Language

Makes computers understand feelings in text better.

29 Nov 2025 0

88%

TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages

Computation and Language

Tests computer smarts in many languages.

16 Feb 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com github.com github.com github.com huggingface.co

Page Count

77 pages

Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis

Tests computer understanding of Turkish words.

Technical Abstract

Büyük Dil Modelleri için TR-MMLU Benchmarkı: Performans Değerlendirmesi, Zorluklar ve İyileştirme Fırsatları

Developing a Comprehensive Framework for Sentiment Analysis in Turkish

TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages