February 7, 2025Open Access

A Survey of DeepSeek Models

Key Points

Key points are not available for this paper at this time.

Abstract

Advances in artificial intelligence (AI) rely on systems capable of human-like reasoning, a limitation for conventional Large Language Models (LLMs), which struggle with multi-step logic, abstract conceptualization, and latent relationship inference. DeepSeek AI addresses these challenges through computationally efficient architectures, including DeepSeek Mixture-of-Experts (MoE) framework, which reduces inference costs while maintaining performance. DeepSeek v3, a general-purpose LLM optimized for instruction following and reasoning, DeepSeek Coder (code generation and software engineering), DeepSeek Math (symbolic and quantitative reasoning), DeepSeek R1-Zero (Pure RL, no SFT) and DeepSeek R1 designed for cross-domain problem-solving with minimal fine-tuning. By open-sourcing hardware agnostic implementations, DeepSeek broadens access to high-performance AI. This paper surveys DeepSeek's architectural advancements, comparing its features and limitations with state-of-the-art LLMs. It also explores its impact on AI research and provides a detailed discussion on potential directions for future work.

Bookmark

View Full Paper

Bookmark

View Full Paper

A Survey of DeepSeek Models

Key Points

Abstract

Cite This Study