NEAT: Concept driven Neuron Attribution in LLMs
By: Vivek Hruday Kavuri, Gargi Shroff, Rahul Mishra
Potential Business Impact:
Finds AI's "thinking" parts to fix bias.
Locating neurons that are responsible for final predictions is important for opening the black-box large language models and understanding the inside mechanisms. Previous studies have tried to find mechanisms that operate at the neuron level but these methods fail to represent a concept and there is also scope for further optimization of compute required. In this paper, with the help of concept vectors, we propose a method for locating significant neurons that are responsible for representing certain concepts and term those neurons as concept neurons. If the number of neurons is n and the number of examples is m, we reduce the number of forward passes required from O(n*m) to just O(n) compared to the previous works and hence optimizing the time and computation required over previous works. We also compare our method with several baselines and previous methods and our results demonstrate better performance than most of the methods and are more optimal when compared to the state-of-the-art method. We, as part of our ablation studies, also try to optimize the search for the concept neurons by involving clustering methods. Finally, we apply our methods to find, turn off the neurons that we find, and analyze its implications in parts of hate speech and bias in LLMs, and we also evaluate our bias part in terms of Indian context. Our methodology, analysis and explanations facilitate understating of neuron-level responsibility for more broader and human-like concepts and also lay a path for future research in this direction of finding concept neurons and intervening them.
Similar Papers
Identifying Good and Bad Neurons for Task-Level Controllable LLMs
Computation and Language
Finds good and bad brain cells in AI.
Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models
Neurons and Cognition
Finds brain-like networks inside AI.
Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?
Software Engineering
Helps computers understand and use different programming languages.