Delta Activations: A Representation for Finetuned Large Language Models
By: Zhiqiu Xu , Amish Sethi , Mayur Naik and more
Potential Business Impact:
Organizes AI models by how they learn.
The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. This representation allows for effective clustering by domain and task, revealing structure in the model landscape. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. In addition, we show that Delta Activations can embed tasks via few-shot finetuning, and further explore its use for model selection and merging. We hope Delta Activations can facilitate the practice of reusing publicly available models. Code is available at https://github.com/OscarXZQ/delta_activations.
Similar Papers
Language models' activations linearly encode training-order recency
Machine Learning (CS)
Computers remember when they learned new facts.
SuperActivators: Only the Tail of the Distribution Contains Reliable Concept Signals
Machine Learning (CS)
Finds hidden meaning in computer "thoughts."
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Computation and Language
Finds hidden training in AI models.