Score: 0

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

Published: January 13, 2026 | arXiv ID: 2601.08564v1

By: Yongtong Gu, Songze Li, Xia Hu

The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders. Specifically, MASH achieves an average Attack Success Rate (ASR) of 92%, surpassing the strongest baselines by an average of 24%, while maintaining superior linguistic quality.

DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

Computation and Language

Finds where computers start writing in human text.

4 Dec 2025 1

88%

Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

Computation and Language

Makes AI-written text harder to spot.

30 May 2025 2

88%

TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors

Cryptography and Security

Helps tell if writing is from a person or computer.

10 Mar 2025 2

View PDF Login to Bookmark

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

Technical Abstract

DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors