MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark
By: Anyang Song , Ying Cheng , Yiqian Xu and more
Potential Business Impact:
Makes fake text harder to spot.
Large Language Models (LLMs) alignment is constantly evolving. Machine-Generated Text (MGT) is becoming increasingly difficult to distinguish from Human-Written Text (HWT). This has exacerbated abuse issues such as fake news and online fraud. Fine-tuned detectors' generalization ability is highly dependent on dataset quality, and simply expanding the sources of MGT is insufficient. Further augment of generation process is required. According to HC-Var's theory, enhancing the alignment of generated text can not only facilitate attacks on existing detectors to test their robustness, but also help improve the generalization ability of detectors fine-tuned on it. Therefore, we propose \textbf{M}achine-\textbf{A}ugment-\textbf{G}enerated Text via \textbf{A}lignment (MAGA). MAGA's pipeline achieves comprehensive alignment from prompt construction to reasoning process, among which \textbf{R}einforced \textbf{L}earning from \textbf{D}etectors \textbf{F}eedback (RLDF), systematically proposed by us, serves as a key component. In our experiments, the RoBERTa detector fine-tuned on MAGA training set achieved an average improvement of 4.60\% in generalization detection AUC. MAGA Dataset caused an average decrease of 8.13\% in the AUC of the selected detectors, expecting to provide indicative significance for future research on the generalization detection ability of detectors.
Similar Papers
Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors
Computation and Language
Makes AI-written text harder to spot.
AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models
Computation and Language
Finds fake writing made by computers.
AI-generated Text Detection with a GLTR-based Approach
Computation and Language
Finds fake writing made by computers.