Score: 1

MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking

Published: December 3, 2025 | arXiv ID: 2512.04044v1

By: Yizhou Zhao, Zhiwei Steven Wu, Adam Block

Potential Business Impact:

Makes AI writing harder to fake or change.

Business Areas:

Text Analytics Data and Analytics, Software

Watermarking aims to embed hidden signals in generated text that can be reliably detected when given access to a secret key. Open-weight language models pose acute challenges for such watermarking schemes because the inference-time interventions that dominate contemporary approaches cannot be enforced once model weights are public. Existing watermaking techniques for open-weight models, such as the recently proposed GaussMark, typically rely on small modifications to model weights, which can yield signals detectable to those equipped with a secret key, but achieving detection power comparable to inference-time watermarks generally requires weight perturbations that noticeably reduce generation quality. We introduce MarkTune, a theoretically principled, on-policy fine-tuning framework that treats the GaussMark signal as a reward while simultaneously regularizing against degradation in text quality. We derive MarkTune as an improvement on GaussMark and demonstrate that MarkTune consistently improves the quality-detectability trade-off over GaussMark by steering finer-grained, watermark-aware weight updates within the model's representation space while preserving generation quality. Empirically, we show that MarkTune pushes the quality-detectability frontier of GaussMark close to that of inference-time watermarking, remains robust to paraphrasing and fine-tuning attacks, and exhibits strong generalization: a model fine-tuned on one dataset retains substantial watermark detection power on unseen datasets. Together, these results establish MarkTune as a general strategy for embedding robust, high-quality watermarks into open-weight LMs.

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models

Machine Learning (CS)

Makes AI writing show it's from AI.

8 Apr 2025 0

89%

Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

Computation and Language

Protects writing from being copied by AI.

3 Oct 2025 2

89%

Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization

Cryptography and Security

Makes AI writing sound natural, not fake.

13 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

31 pages

MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking

Makes AI writing harder to fake or change.

Technical Abstract

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models

Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

Learning to Watermark: A Selective Watermarking Framework for Large Language Models via Multi-Objective Optimization