StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching
By: Robert Dilworth
Potential Business Impact:
Hides writing style to protect author privacy.
Stylometry--the identification of an author through analysis of a text's style (i.e., authorship attribution)--serves many constructive purposes: it supports copyright and plagiarism investigations, aids detection of harmful content, offers exploratory cues for certain medical conditions (e.g., early signs of dementia or depression), provides historical context for literary works, and helps uncover misinformation and disinformation. In contrast, when stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes. Techniques such as de-anonymization, re-identification, tracking, profiling, and downstream effects like censorship illustrate the privacy threats that stylometric analysis can enable. Building on these concerns, this paper further explores how adversarial stylometry combined with steganography can counteract stylometric analysis. We first present enhancements to our adversarial attack, $\textit{TraceTarnish}$, providing stronger evidence of its capacity to confound stylometric systems and reduce their attribution and verification accuracy. Next, we examine how steganographic embedding can be fine-tuned to mask an author's stylistic fingerprint, quantifying the level of authorship obfuscation achievable as a function of the proportion of words altered with zero-width Unicode characters. Based on our findings, steganographic coverage of 33% or higher seemingly ensures authorship obfuscation. Finally, we reflect on the ways stylometry can be used to undermine privacy and argue for the necessity of defensive tools like $\textit{TraceTarnish}$.
Similar Papers
Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
Cryptography and Security
Finds who wrote online messages, even if hidden.
Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits
Cryptography and Security
Makes it harder to tell who wrote a message.
A stylometric analysis of speaker attribution from speech transcripts
Computation and Language
Identifies speakers by what they say, not just how.