Score: 1

Exploring How LLMs Capture and Represent Domain-Specific Knowledge

Published: April 23, 2025 | arXiv ID: 2504.16871v2

By: Mirian Hipolito Garcia , Camille Couturier , Daniel Madrigal Diaz and more

BigTech Affiliations: Microsoft

Potential Business Impact:

Helps computers pick the best AI for each job.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We study whether Large Language Models (LLMs) inherently capture domain-specific nuances in natural language. Our experiments probe the domain sensitivity of LLMs by examining their ability to distinguish queries from different domains using hidden states generated during the prefill phase. We reveal latent domain-related trajectories that indicate the model's internal recognition of query domains. We also study the robustness of these domain representations to variations in prompt styles and sources. Our approach leverages these representations for model selection, mapping the LLM that best matches the domain trace of the input query (i.e., the model with the highest performance on similar traces). Our findings show that LLMs can differentiate queries for related domains, and that the fine-tuned model is not always the most accurate. Unlike previous work, our interpretations apply to both closed and open-ended generative tasks