DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
By: DeepSeek-AI , Aixin Liu , Aoxue Mei and more
Potential Business Impact:
Makes computers think and solve hard problems.
We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5. Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This methodology facilitates scalable agentic post-training, yielding substantial improvements in generalization and instruction-following robustness within complex, interactive environments.
Similar Papers
DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models
Computation and Language
Helps computers solve math and medical problems.
A Review of DeepSeek Models' Key Innovative Techniques
Machine Learning (CS)
Makes smart computer programs better and cheaper.
DeepSeek performs better than other Large Language Models in Dental Cases
Computation and Language
Helps dentists understand patient history better.