Mechanistic Analysis of Circuit Preservation in Federated Learning
By: Muhammad Haseeb, Salaar Masood, Muhammad Abdullah Sohail
Federated Learning (FL) enables collaborative training of models on decentralized data, but its performance degrades significantly under Non-IID (non-independent and identically distributed) data conditions. While this accuracy loss is well-documented, the internal mechanistic causes remain a black box. This paper investigates the canonical FedAvg algorithm through the lens of Mechanistic Interpretability (MI) to diagnose this failure mode. We hypothesize that the aggregation of conflicting client updates leads to circuit collapse, the destructive interference of functional, sparse sub-networks responsible for specific class predictions. By training inherently interpretable, weight-sparse neural networks within an FL framework, we identify and track these circuits across clients and communication rounds. Using Intersection-over-Union (IoU) to quantify circuit preservation, we provide the first mechanistic evidence that Non-IID data distributions cause structurally distinct local circuits to diverge, leading to their degradation in the global model. Our findings reframe the problem of statistical drift in FL as a concrete, observable failure of mechanistic preservation, paving the way for more targeted solutions.
Similar Papers
A Robust Federated Learning Approach for Combating Attacks Against IoT Systems Under non-IID Challenges
Machine Learning (CS)
Helps computers learn to spot internet dangers.
Understanding Federated Learning from IID to Non-IID dataset: An Experimental Study
Machine Learning (CS)
Fixes AI learning when data is different.
Federated Learning in the Wild: A Comparative Study for Cybersecurity under Non-IID and Unbalanced Settings
Cryptography and Security
Helps computers find online attacks without sharing private data.