Score: 0

Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs

Published: September 25, 2025 | arXiv ID: 2509.21305v1

By: Daniel Vennemeyer , Phan Anh Duong , Tiffany Zhan and more

Potential Business Impact:

Makes AI agree with you less.

Business Areas:
Professional Networking Community and Lifestyle, Professional Services

Large language models (LLMs) often exhibit sycophantic behaviors -- such as excessive agreement with or flattery of the user -- but it is unclear whether these behaviors arise from a single mechanism or multiple distinct processes. We decompose sycophancy into sycophantic agreement and sycophantic praise, contrasting both with genuine agreement. Using difference-in-means directions, activation additions, and subspace geometry across multiple models and datasets, we show that: (1) the three behaviors are encoded along distinct linear directions in latent space; (2) each behavior can be independently amplified or suppressed without affecting the others; and (3) their representational structure is consistent across model families and scales. These results suggest that sycophantic behaviors correspond to distinct, independently steerable representations.

Country of Origin
🇺🇸 United States

Page Count
24 pages

Category
Computer Science:
Computation and Language