Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
By: Robert Dilworth
Potential Business Impact:
Finds who wrote online messages, even if hidden.
When using a public communication channel -- whether formal or informal, such as commenting or posting on social media -- end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence -- using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting -- one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message -- necessarily open for public consumption -- exposes an attack vector: stylometric analysis, or author profiling. In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.
Similar Papers
StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching
Cryptography and Security
Hides writing style to protect author privacy.
Stylomech: Unveiling Authorship via Computational Stylometry in English and Romanized Sinhala
Computation and Language
Identifies who wrote online text, even in Sinhala.
Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits
Cryptography and Security
Makes it harder to tell who wrote a message.