Score: 3

How Good is Post-Hoc Watermarking With Language Model Rephrasing?

Published: December 18, 2025 | arXiv ID: 2512.16904v1

By: Pierre Fernandez , Tom Sander , Hady Elsahar and more

BigTech Affiliations: Meta

Potential Business Impact:

Makes AI writing traceable, even after it's written.

Business Areas:
Text Analytics Data and Analytics, Software

Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
24 pages

Category
Computer Science:
Cryptography and Security