luofuli commited on
Commit
f1e9ce2
1 Parent(s): e67c544

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -3,13 +3,13 @@
3
  <!-- markdownlint-disable no-duplicate-header -->
4
 
5
  <div align="center">
6
- <img src="figures/logo.svg" width="60%" alt="DeepSeek LLM" />
7
  </div>
8
  <hr>
9
  <div align="center">
10
 
11
  <a href="https://www.deepseek.com/" target="_blank">
12
- <img alt="Homepage" src="figures/badge.svg" />
13
  </a>
14
  <a href="https://chat.deepseek.com/" target="_blank">
15
  <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20LLM-536af5?color=536af5&logoColor=white" />
@@ -25,7 +25,7 @@
25
  <a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
26
  <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" />
27
  </a>
28
- <a href="figures/qr.jpeg" target="_blank">
29
  <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" />
30
  </a>
31
  <a href="https://twitter.com/deepseek_ai" target="_blank">
@@ -66,8 +66,8 @@ Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) langua
66
  <p align="center">
67
 
68
  <div style="display: flex; justify-content: center;">
69
- <img src="figures/activationparameters.png" style="height:300px; width:auto; margin-right:10px">
70
- <img src="figures/trainingcost.png" style="height:300px; width:auto; margin-left:10px">
71
  </div>
72
  </p>
73
  We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
@@ -107,7 +107,7 @@ For more evaluation details, such as few-shot settings and prompts, please check
107
 
108
  #### Context Window
109
  <p align="center">
110
- <img width="80%" src="figures/niah.png">
111
  </p>
112
 
113
  Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to **128K**.
@@ -133,7 +133,7 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 pe
133
  #### English Open Ended Generation Evaluation
134
  We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
135
  <p align="center">
136
- <img width="50%" src="figures/mtbench.png" />
137
  </p>
138
 
139
  #### Chinese Open Ended Generation Evaluation
@@ -160,7 +160,7 @@ We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive per
160
  We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
161
 
162
  <p align="center">
163
- <img width="50%" src="figures/code_benchmarks.png">
164
  </p>
165
 
166
  ## 4. Model Architecture
@@ -169,7 +169,7 @@ DeepSeek-V2 adopts innovative architectures to guarantee economical training and
169
  - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
170
 
171
  <p align="center">
172
- <img width="90%" src="figures/architecture.png" />
173
  </p>
174
 
175
  ## 5. Chat Website
@@ -180,7 +180,7 @@ We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.c
180
 
181
 
182
  <p align="center">
183
- <img width="40%" src="figures/model_price.png">
184
  </p>
185
 
186
 
 
3
  <!-- markdownlint-disable no-duplicate-header -->
4
 
5
  <div align="center">
6
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg" width="60%" alt="DeepSeek LLM" />
7
  </div>
8
  <hr>
9
  <div align="center">
10
 
11
  <a href="https://www.deepseek.com/" target="_blank">
12
+ <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg" />
13
  </a>
14
  <a href="https://chat.deepseek.com/" target="_blank">
15
  <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20LLM-536af5?color=536af5&logoColor=white" />
 
25
  <a href="https://discord.gg/Tc7c45Zzu5" target="_blank">
26
  <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" />
27
  </a>
28
+ <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg" target="_blank">
29
  <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" />
30
  </a>
31
  <a href="https://twitter.com/deepseek_ai" target="_blank">
 
66
  <p align="center">
67
 
68
  <div style="display: flex; justify-content: center;">
69
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/activationparameters.png" style="height:300px; width:auto; margin-right:10px">
70
+ <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/trainingcost.png" style="height:300px; width:auto; margin-left:10px">
71
  </div>
72
  </p>
73
  We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation.
 
107
 
108
  #### Context Window
109
  <p align="center">
110
+ <img width="80%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/niah.png">
111
  </p>
112
 
113
  Evaluation results on the ``Needle In A Haystack`` (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to **128K**.
 
133
  #### English Open Ended Generation Evaluation
134
  We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation.
135
  <p align="center">
136
+ <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/mtbench.png" />
137
  </p>
138
 
139
  #### Chinese Open Ended Generation Evaluation
 
160
  We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for live coding challenges. As illustrated, DeepSeek-V2 demonstrates considerable proficiency in LiveCodeBench, achieving a Pass@1 score that surpasses several other sophisticated models. This performance highlights the model's effectiveness in tackling live coding tasks.
161
 
162
  <p align="center">
163
+ <img width="50%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/code_benchmarks.png">
164
  </p>
165
 
166
  ## 4. Model Architecture
 
169
  - For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
170
 
171
  <p align="center">
172
+ <img width="90%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/architecture.png" />
173
  </p>
174
 
175
  ## 5. Chat Website
 
180
 
181
 
182
  <p align="center">
183
+ <img width="40%" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/model_price.png">
184
  </p>
185
 
186