Score: 1

SuperActivators: Only the Tail of the Distribution Contains Reliable Concept Signals

Published: December 4, 2025 | arXiv ID: 2512.05038v1

By: Cassandra Goldberg , Chaehyeon Kim , Adam Stein and more

Potential Business Impact:

Finds hidden meaning in computer "thoughts."

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Concept vectors aim to enhance model interpretability by linking internal representations with human-understandable semantics, but their utility is often limited by noisy and inconsistent activations. In this work, we uncover a clear pattern within the noise, which we term the SuperActivator Mechanism: while in-concept and out-of-concept activations overlap considerably, the token activations in the extreme high tail of the in-concept distribution provide a reliable signal of concept presence. We demonstrate the generality of this mechanism by showing that SuperActivator tokens consistently outperform standard vector-based and prompting concept detection approaches, achieving up to a 14% higher F1 score across image and text modalities, model architectures, model layers, and concept extraction techniques. Finally, we leverage SuperActivator tokens to improve feature attributions for concepts.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
67 pages

Category
Computer Science:
Machine Learning (CS)