Score: 1

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

Published: November 10, 2025 | arXiv ID: 2511.07595v1

By: Özay Ezerceli , Gizem Gümüşçekiçci , Tuğba Erkoç and more

Potential Business Impact:

Finds Turkish information much faster.

Business Areas:
Semantic Search Internet Services

In this work, we introduce TurkEmbed4Retrieval, a retrieval specialized variant of the TurkEmbed model originally designed for Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. By fine-tuning the base model on the MS MARCO TR dataset using advanced training techniques, including Matryoshka representation learning and a tailored multiple negatives ranking loss, we achieve SOTA performance for Turkish retrieval tasks. Extensive experiments demonstrate that our model outperforms Turkish colBERT by 19,26% on key retrieval metrics for the Scifact TR dataset, thereby establishing a new benchmark for Turkish information retrieval.

Country of Origin
🇹🇷 Turkey

Page Count
4 pages

Category
Computer Science:
Information Retrieval