TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task
By: Özay Ezerceli , Gizem Gümüşçekiçci , Tuğba Erkoç and more
Potential Business Impact:
Finds Turkish information much faster.
In this work, we introduce TurkEmbed4Retrieval, a retrieval specialized variant of the TurkEmbed model originally designed for Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. By fine-tuning the base model on the MS MARCO TR dataset using advanced training techniques, including Matryoshka representation learning and a tailored multiple negatives ranking loss, we achieve SOTA performance for Turkish retrieval tasks. Extensive experiments demonstrate that our model outperforms Turkish colBERT by 19,26% on key retrieval metrics for the Scifact TR dataset, thereby establishing a new benchmark for Turkish information retrieval.
Similar Papers
TurkEmbed: Turkish Embedding Model on NLI & STS Tasks
Computation and Language
Helps computers understand Turkish text much better.
TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval
Computation and Language
Finds Turkish information much faster with less data.
Semantic Search for Information Retrieval
Information Retrieval
Helps computers find information by understanding meaning.