Score: 1

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Published: April 9, 2025 | arXiv ID: 2504.06575v2

By: Li An , Yujian Liu , Yepeng Liu and more

Potential Business Impact:

Stops bad people from changing AI text meaning.

Business Areas:

Text Analytics Data and Analytics, Software

Watermarking has emerged as a promising technique for detecting texts generated by LLMs. Current research has primarily focused on three design criteria: high quality of the watermarked text, high detectability, and robustness against removal attack. However, the security against spoofing attacks remains relatively understudied. For example, a piggyback attack can maliciously alter the meaning of watermarked text-transforming it into hate speech-while preserving the original watermark, thereby damaging the reputation of the LLM provider. We identify two core challenges that make defending against spoofing difficult: (1) the need for watermarks to be both sensitive to semantic-distorting changes and insensitive to semantic-preserving edits, and (2) the contradiction between the need to detect global semantic shifts and the local, auto-regressive nature of most watermarking schemes. To address these challenges, we propose a semantic-aware watermarking algorithm that post-hoc embeds watermarks into a given target text while preserving its original meaning. Our method introduces a semantic mapping model, which guides the generation of a green-red token list, contrastively trained to be sensitive to semantic-distorting changes and insensitive to semantic-preserving changes. Experiments on two standard benchmarks demonstrate strong robustness against removal attacks and security against spoofing attacks, including sentiment reversal and toxic content insertion, while maintaining high watermark detectability. Our approach offers a significant step toward more secure and semantically aware watermarking for LLMs. Our code is available at https://github.com/UCSB-NLP-Chang/contrastive-watermark.

Watermarking Needs Input Repetition Masking

Machine Learning (CS)

Makes AI text harder to spot, even with watermarks.

16 Apr 2025 0

90%

LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data

Computation and Language

Marks text so AI can't steal it.

17 Jun 2025 2

89%

DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack

Cryptography and Security

Protects AI writing from being faked or changed.

18 Dec 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

18 pages

Defending LLM Watermarking Against Spoofing Attacks with Contrastive Representation Learning

Stops bad people from changing AI text meaning.

Technical Abstract

Watermarking Needs Input Repetition Masking

LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data

DualGuard: Dual-stream Large Language Model Watermarking Defense against Paraphrase and Spoofing Attack