Context-aware Fairness Evaluation and Mitigation in LLMs
By: Afrozah Nadeem, Mark Dras, Usman Naseem
Potential Business Impact:
Fixes AI to be fairer and less harmful.
Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialogue and conversations. Although training-time or data-centric methods attempt to reduce these effects, they are computationally expensive, irreversible once deployed, and slow to adapt to new conversational contexts. Pruning-based methods provide a flexible and transparent way to reduce bias by adjusting the neurons responsible for certain behaviors. However, most existing approaches are static; once a neuron is removed, the model loses the ability to adapt when the conversation or context changes. To address this, we propose a dynamic, reversible, pruning-based framework that detects context-aware neuron activations and applies adaptive masking to modulate their influence during generation. Our inference-time solution provides fine-grained, memory-aware mitigation with knowledge-preserved, more coherent behavior across multilingual single- and multi-turn dialogues, enabling dynamic fairness control in real-world conversational AI.
Similar Papers
Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing
Artificial Intelligence
Makes AI fairer by hiding biased thoughts.
Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation
Computation and Language
Makes AI less likely to be unfair or biased.
Pruning Strategies for Backdoor Defense in LLMs
Machine Learning (CS)
Cleans smart language tools from hidden tricks.