Score: 3

ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

Published: August 15, 2025 | arXiv ID: 2508.11281v1

By: Axel Delaval , Shujian Yang , Haicheng Wang and more

Potential Business Impact:

Finds mean online comments better in French.

Detecting toxic content using language models is crucial yet challenging. While substantial progress has been made in English, toxicity detection in French remains underdeveloped, primarily due to the lack of culturally relevant, large-scale datasets. In this work, we introduce TOXIFRENCH, a new public benchmark of 53,622 French online comments, constructed via a semi-automated annotation pipeline that reduces manual labeling to only 10% through high-confidence LLM-based pre-annotation and human verification. Then, we benchmark a broad range of models and uncover a counterintuitive insight: Small Language Models (SLMs) outperform many larger models in robustness and generalization under the toxicity detection task. Motivated by this finding, we propose a novel Chain-of-Thought (CoT) fine-tuning strategy using a dynamic weighted loss that progressively emphasizes the model's final decision, significantly improving faithfulness. Our fine-tuned 4B model achieves state-of-the-art performance, improving its F1 score by 13% over its baseline and outperforming LLMs such as GPT-40 and Gemini-2.5. Further evaluation on a cross-lingual toxicity benchmark demonstrates strong multilingual ability, suggesting that our methodology can be effectively extended to other languages and safety-critical classification tasks.

CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction

Machine Learning (CS)

Finds drug dangers before testing on people.

5 Aug 2025 1

89%

Through the Valley: Path to Effective Long CoT Training for Small Language Models

Computation and Language

Makes small AI think better, avoids mistakes.

9 Jun 2025 1

89%

ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting

Computation and Language

Cleans up bad words in many languages.

24 Jul 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com github.com huggingface.co

Page Count

14 pages

ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection

Finds mean online comments better in French.

Technical Abstract

CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction

Through the Valley: Path to Effective Long CoT Training for Small Language Models

ylmmcl at Multilingual Text Detoxification 2025: Lexicon-Guided Detoxification and Classifier-Gated Rewriting