Anthropic
Corporate β’ πΊπΈ United States
Big TechPapers (L12M)
9
Researchers (β)
10
Papers w/ Code
3
Papers w/ Dataset
0
Topic Overview
Bubble chart placeholder
Recent Papers (see all )
Unsupervised decoding of encoded reasoning using language model interpretability
Code
Artificial Intelligence
Natural Emergent Misalignment from Reward Hacking in Production RL
Artificial Intelligence
Steering Language Models with Weight Arithmetic
Code
Computation and Language
Evaluating Control Protocols for Untrusted AI Agents
Artificial Intelligence
Agentic Misalignment: How LLMs Could Be Insider Threats
Code
Cryptography and Security