Score: 0

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Published: November 12, 2025 | arXiv ID: 2511.08944v1

By: Kazuki Iwahana , Yusuke Yamasaki , Akira Ito and more

Potential Business Impact:

Fixes AI that was tricked by bad data.

Business Areas:

Intrusion Detection Information Technology, Privacy and Security

Backdoor attacks pose a critical threat to machine learning models, causing them to behave normally on clean data but misclassify poisoned data into a poisoned class. Existing defenses often attempt to identify and remove backdoor neurons based on Trigger-Activated Changes (TAC) which is the activation differences between clean and poisoned data. These methods suffer from low precision in identifying true backdoor neurons due to inaccurate estimation of TAC values. In this work, we propose a novel backdoor removal method by accurately reconstructing TAC values in the latent representation. Specifically, we formulate the minimal perturbation that forces clean data to be classified into a specific class as a convex quadratic optimization problem, whose optimal solution serves as a surrogate for TAC. We then identify the poisoned class by detecting statistically small $L^2$ norms of perturbations and leverage the perturbation of the poisoned class in fine-tuning to remove backdoors. Experiments on CIFAR-10, GTSRB, and TinyImageNet demonstrated that our approach consistently achieves superior backdoor suppression with high clean accuracy across different attack types, datasets, and architectures, outperforming existing defense methods.

Backdoor Unlearning by Linear Task Decomposition

Machine Learning (CS)

Cleans computer "brains" without breaking them.

16 Oct 2025 1

89%

Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI

Cryptography and Security

Cleans computer brains of hidden bad instructions.

26 Nov 2025 0

89%

Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning

Cryptography and Security

Makes computer "cheats" disappear after they're used.

15 Oct 2025 1

View PDF Login to Bookmark

Page Count

21 pages

Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation

Fixes AI that was tricked by bad data.

Technical Abstract

Backdoor Unlearning by Linear Task Decomposition

Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI

Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning