Score: 0

Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure

Published: January 15, 2026 | arXiv ID: 2601.10566v1

By: Syed Naveed Mahmood , Md. Rezaur Rahman Bhuiyan , Tasfia Zaman and more

Selective knowledge erasure from LLMs is critical for GDPR compliance and model safety, yet current unlearning methods conflate behavioral suppression with true knowledge removal, allowing latent capabilities to persist beneath surface-level refusals. In this work, we address this challenge by introducing Knowledge Immunization Framework (KIF), a representation-aware architecture that distinguishes genuine erasure from obfuscation by targeting internal activation signatures rather than surface outputs. Our approach combines dynamic suppression of subject-specific representations with parameter-efficient adaptation, enabling durable unlearning without full model retraining. KIF achieves near-oracle erasure (FQ approx 0.99 vs. 1.00) while preserving utility at oracle levels (MU = 0.62), effectively breaking the stability-erasure tradeoff that has constrained all prior work. We evaluate both standard foundation models (Llama and Mistral) and reasoning-prior models (Qwen and DeepSeek) across 3B to 14B parameters. Our observation shows that standard models exhibit scale-independent true erasure (<3% utility drift), while reasoning-prior models reveal fundamental architectural divergence. Our comprehensive dual-metric evaluation protocol, combining surface-level leakage with latent trace persistence, operationalizes the obfuscation - erasure distinction and enables the first systematic diagnosis of mechanism-level forgetting behavior across model families and scales.

Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models

Computation and Language

Makes AI forget bad or wrong information.

27 Feb 2025 2

89%

Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?

Machine Learning (CS)

Removes unwanted info from AI, making it safer.

5 May 2025 1

88%

The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation

Cryptography and Security

Makes AI forget unwanted information better.

22 Dec 2025 0

View PDF Login to Bookmark

Representation-Aware Unlearning via Activation Signatures: From Suppression to Knowledge-Signature Erasure

Technical Abstract

Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models

Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?

The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation