zhaicunqi commited on
Commit
840d770
1 Parent(s): 6dd39aa

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +554 -3
  2. README_CN.md +564 -0
README.md CHANGED
@@ -1,3 +1,554 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ library_name: transformers
7
+ tags:
8
+ - qihoo360
9
+ - 奇虎360
10
+ - zhinao
11
+ - 360Zhinao
12
+ - pretrain
13
+ ---
14
+
15
+ <p align="left">
16
+ <a href="./README_CN.md">中文</a> | &nbsp English</a>&nbsp
17
+ </p>
18
+ <br>
19
+
20
+ <div align="center">
21
+ <h1>
22
+ 360Zhinao2 (360智脑)
23
+ </h1>
24
+ </div>
25
+ <div align="center">
26
+ 🤗 <a href="https://huggingface.co/qihoo360">HuggingFace</a>&nbsp&nbsp | &nbsp&nbsp
27
+ 🤖 <a href="https://www.modelscope.cn/profile/qihoo360">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp
28
+ 💬 <a href="./assets/WeChat.png">WeChat (微信)</a>&nbsp&nbsp
29
+ </div>
30
+ <br>
31
+ <p align="center">
32
+ Feel free to visit 360Zhinao's official website<a href="https://ai.360.com"> https://ai.360.com</a> for more experience.
33
+ </p>
34
+
35
+ <br>
36
+
37
+ # Introduction
38
+ 🎉🎉🎉 We released the 360Zhinao2 model series:
39
+ - **360Zhinao2-7B-Base**
40
+ - **360Zhinao2-7B-Chat-4K**
41
+ - **360Zhinao2-7B-Chat-32K**
42
+ - **360Zhinao2-7B-Chat-360K**
43
+
44
+ Notable features of our 360Zhinao models are:
45
+
46
+ - **Base Model:** Using popular two-stage training method, In the first stage we totally train 10T tokens with a cosine learning rate schedule. In the second stage we increase the proportion of high-quality data and totally train 100B tokens, with the learning rate decaying directly to 0. The total training data for 360Zhinao2-7B amounts to 10.1T tokens.
47
+ - **Chat Models:** Powerful chat capabilities and three context lengths of 4K, 32K and 360K.
48
+
49
+ <br>
50
+
51
+ # News and Updates
52
+ - [2024.11.18] 🔥🔥🔥We release 360Zhinao2-7B, providing access to both the Base model and Chat models with text lengths of 4K, 32K, and 360K.
53
+ - [2024.05.23] We released two models, 360Zhinao-search and 360Zhinao-1.8B-Reranking, which ranked first respectively in the Retrieval and Reranking tasks of [C-MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) .
54
+ - [2024.05.20] We extended llama3 and released **llama3-8B-360Zhinao-360k-Instruct**<a href="https://huggingface.co/qihoo360/llama3-8B-360Zhinao-360k-Instruct">🤗</a>
55
+ - [2024.04.12] We released **360Zhinao-7B** v1.0, including the base model and three chat models with context lengths 4K, 32K and 360K.
56
+ Technical report is on [arXiv](https://arxiv.org/abs/2405.13386).
57
+
58
+ <br>
59
+
60
+ # Table of contents
61
+ - [Download URL](#Download-URL)
62
+ - [Model Evaluation](#Model-Evaluation)
63
+ - [Quickstart](#Quickstart)
64
+ - [Model Inference](#Model-Inference)
65
+ - [Model Finetune](#Model-Finetune)
66
+ - [License](#License)
67
+
68
+ <br>
69
+
70
+ # Download URL
71
+
72
+ | Size | Model | BF16 | Int4|
73
+ |-|-|-|-|
74
+ | 7B | 360Zhinao2-7B-Base | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Base/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Base">🤗</a> | |
75
+ | 7B | 360Zhinao2-7B-Chat-4K | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-4K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-4K">🤗</a> | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-4K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-4K-Int4">🤗</a> |
76
+ | 7B | 360Zhinao2-7B-Chat-32K | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-32K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-32K">🤗</a> | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-32K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-32K-Int4">🤗</a> |
77
+ | 7B | 360Zhinao2-7B-Chat-360K | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-360K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-360K">🤗</a> | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-360K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-360K-Int4">🤗</a> |
78
+
79
+ <br>
80
+
81
+ # Model Evaluation
82
+ ## Base Model
83
+ We used the open-source tool OpenCompass to evaluate the model and compared it with open-source models under 10B from the past six months. The 360Zhinao2-7B model is competive. The 360Zhinao2-7B model performs well on Chinese benchmarks such as CEval, C3 and LCSTS. The average socres of Chinese benchmarks is No 1. It also ranks No 1 on Math which is a challenging competition math dataset. **The 360Zhinao2-7B model has advantages in Chinese benchmark and challenging competition math.**
84
+
85
+ <table>
86
+ <tr>
87
+ <td>Type</td><td>Datasets</td><td>language</td><td>glm4-9b</td><td>Qwen2.5-7B</td><td>internlm2.5-7b</td><td>Yi1.5-9B</td><td>gemma2-9b</td><td>Llama3.1-8B</td><td>360Zhinao2-7B</td>
88
+ </tr>
89
+ <tr>
90
+ <td rowspan="5">Exam</td><td>ceval</td><td>zh</td><td>75.83</td><td>81.41</td><td>77.71</td><td>73.51</td><td>56.36</td><td>51.67</td><td><strong>83.04</strong></td>
91
+ </tr>
92
+ <tr>
93
+ <td>mmlu</td><td>en</td><td>75.5</td><td>75.5</td><td>71.55</td><td>71.43</td><td>72.22</td><td>66.75</td><td>67.84</td>
94
+ </tr>
95
+ <tr>
96
+ <td>cmmlu</td><td>zh</td><td>74.24</td><td>81.79</td><td>78.77</td><td>74.2</td><td>58.89</td><td>52.49</td><td>73.8</td>
97
+ </tr>
98
+ <tr>
99
+ <td>ARC-c</td><td>en</td><td>94.92</td><td>80</td><td>85.08</td><td>87.46</td><td>77.63</td><td>80.68</td><td>87.12</td>
100
+ </tr>
101
+ <tr>
102
+ <td>ARC-e</td><td>en</td><td>98.41</td><td>84.83</td><td>95.24</td><td>94.53</td><td>78.84</td><td>89.77</td><td>92.77</td>
103
+ </tr>
104
+ <tr>
105
+ <td rowspan="2">Language</td><td>WiC</td><td>en</td><td>51.57</td><td>52.82</td><td>50.78</td><td>50.63</td><td>50.47</td><td>50</td><td>49.84</td>
106
+ </tr>
107
+ <tr>
108
+ <td>WSC</td><td>en</td><td>68.27</td><td>68.27</td><td>69.23</td><td>66.35</td><td>68.27</td><td>67.31</td><td>65.38</td>
109
+ </tr>
110
+ <tr>
111
+ <td rowspan="2">Knowledge</td>
112
+ <td>BoolQ</td><td>en</td><td>81.8</td><td>83.88</td><td>89.51</td><td>84.46</td><td>85.6</td><td>82.2</td><td>88.29</td>
113
+ </tr>
114
+ <tr>
115
+ <td>commonsense_qa</td><td>en</td><td>71.17</td><td>73.22</td><td>68.55</td><td>71.58</td><td>68.47</td><td>71.25</td><td>69.78</td>
116
+ </tr>
117
+ <tr>
118
+ <td rowspan="6">Understanding</td>
119
+ <td>C3</td><td>zh</td><td>91.51</td><td>92</td><td>93.04</td><td>85.86</td><td>81.64</td><td>83.51</td><td><strong>93.26</strong></td>
120
+ </tr>
121
+ <tr>
122
+ <td>race-middle</td><td>en</td><td>91.99</td><td>91.02</td><td>92.06</td><td>91.16</td><td>88.09</td><td>81.69</td><td>90.46</td>
123
+ </tr>
124
+ <tr>
125
+ <td>race-high</td><td>en</td><td>90.71</td><td>87.91</td><td>90.08</td><td>88.34</td><td>82.08</td><td>78.73</td><td>86.74</td>
126
+ </tr>
127
+ <tr>
128
+ <td>lcsts</td><td>zh</td><td>18.29</td><td>15.82</td><td>15.96</td><td>16.49</td><td>10.62</td><td>17.29</td><td><strong>18.61</strong></td>
129
+ </tr>
130
+ <tr>
131
+ <td>eprstmt-dev</td><td>zh</td><td>91.88</td><td>86.88</td><td>91.25</td><td>91.88</td><td>48.12</td><td>83.12</td><td>90</td>
132
+ </tr>
133
+ <tr>
134
+ <td>lambada</td><td>en</td><td>71.67</td><td>71.14</td><td>69.98</td><td>70.64</td><td>75.43</td><td>74.23</td><td>72.56</td>
135
+ </tr>
136
+ <tr>
137
+ <td rowspan="3">Reasoning</td>
138
+ <td>hellaswag</td><td>en</td><td>70.25</td><td>72.76</td><td>70.38</td><td>71.55</td><td>66.83</td><td>74.65</td><td>71.49</td>
139
+ </tr>
140
+ <tr>
141
+ <td>siqa</td><td>en</td><td>81.73</td><td>72.52</td><td>78.97</td><td>76.2</td><td>58.96</td><td>64.18</td><td>77.12</td>
142
+ </tr>
143
+ <tr>
144
+ <td>bbh</td><td>en</td><td>73.68</td><td>54.63</td><td>59.43</td><td>67.86</td><td>68.45</td><td>59.9</td><td>46.54</td>
145
+ </tr>
146
+ <tr>
147
+ <td rowspan="2">Code</td>
148
+ <td>humaneval</td><td>en</td><td>69.51</td><td>75</td><td>60.37</td><td>26.22</td><td>5.49</td><td>27.44</td><td>60.98</td>
149
+ </tr>
150
+ <tr>
151
+ <td>mbpp</td><td>en</td><td>60</td><td>60</td><td>43.6</td><td>56.8</td><td>51.2</td><td>42.6</td><td>54</td>
152
+ </tr>
153
+ <tr>
154
+ <td rowspan="2">Math</td>
155
+ <td>math</td><td>en</td><td>26.86</td><td>38</td><td>27.14</td><td>27.06</td><td>28.52</td><td>15.32</td><td><strong>38.34</strong></td>
156
+ </tr>
157
+ <tr>
158
+ <td>gsm8k</td><td>en</td><td>78.54</td><td>79.76</td><td>52.54</td><td>71.11</td><td>73.09</td><td>56.25</td><td>75.51</td>
159
+ </tr>
160
+ <tr>
161
+ <td rowspan="2">Overall</td>
162
+ <td>avg_zh</td><td></td><td>70.35</td><td>71.58</td><td>71.35</td><td>68.39</td><td>51.13</td><td>57.62</td><td><strong>71.74</strong></td>
163
+ </tr>
164
+ <tr>
165
+ <td>avg_all</td><td></td><td>73.11</td><td>71.78</td><td>69.60</td><td>68.88</td><td>61.60</td><td>62.32</td><td>70.61</td>
166
+ </tr>
167
+ </table>
168
+
169
+
170
+ <br>
171
+
172
+ # Quickstart
173
+ We provide simple examples illustrating the use of 360Zhinao2-7B-Base and 360Zhinao2-7B-Chat on 🤖ModelScope and 🤗Transformers.
174
+
175
+ ## Dependency Installation
176
+ - python >= 3.8
177
+ - pytorch >= 2.0
178
+ - transformers >= 4.37.2
179
+ - CUDA >= 11.4
180
+
181
+ ```shell
182
+ pip install -r requirements.txt
183
+ ```
184
+
185
+ Optionally, we recommend installing Flash-Attention 2 to improve performance and reduce memory footprint.
186
+
187
+ >flash-attn >= 2.3.6
188
+ ```shell
189
+ FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn==2.3.6
190
+ ```
191
+
192
+ ## 🤗 Transformers
193
+ ### Demonstration of Base Model Inference
194
+
195
+ ```python
196
+ from transformers import AutoTokenizer, AutoModelForCausalLM
197
+ from transformers.generation import GenerationConfig
198
+
199
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Base"
200
+
201
+ tokenizer = AutoTokenizer.from_pretrained(
202
+ MODEL_NAME_OR_PATH,
203
+ trust_remote_code=True)
204
+
205
+ model = AutoModelForCausalLM.from_pretrained(
206
+ MODEL_NAME_OR_PATH,
207
+ device_map="auto",
208
+ trust_remote_code=True)
209
+
210
+ generation_config = GenerationConfig.from_pretrained(
211
+ MODEL_NAME_OR_PATH,
212
+ trust_remote_code=True)
213
+
214
+ inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
215
+ inputs = inputs.to(model.device)
216
+
217
+ pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
218
+ print("outputs:\n", tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
219
+ ```
220
+ ### Demonstration of Chat Model Inference
221
+
222
+ ```python
223
+ from transformers import AutoTokenizer, AutoModelForCausalLM
224
+ from transformers.generation import GenerationConfig
225
+
226
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Chat-4K"
227
+
228
+ tokenizer = AutoTokenizer.from_pretrained(
229
+ MODEL_NAME_OR_PATH,
230
+ trust_remote_code=True)
231
+
232
+ model = AutoModelForCausalLM.from_pretrained(
233
+ MODEL_NAME_OR_PATH,
234
+ device_map="auto",
235
+ trust_remote_code=True)
236
+
237
+ generation_config = GenerationConfig.from_pretrained(
238
+ MODEL_NAME_OR_PATH,
239
+ trust_remote_code=True)
240
+
241
+ messages = []
242
+ #round-1
243
+ messages.append({"role": "user", "content": "介绍一下刘德华"})
244
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
245
+ messages.append({"role": "assistant", "content": response})
246
+ print(messages)
247
+
248
+ #round-2
249
+ messages.append({"role": "user", "content": "他有什么代表作?"})
250
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
251
+ messages.append({"role": "assistant", "content": response})
252
+ print(messages)
253
+ ```
254
+
255
+ ## 🤖 ModelScope
256
+ ### Demonstration of Base Model Inference
257
+
258
+ ```python
259
+ from modelscope import AutoModelForCausalLM, AutoTokenizer
260
+ from modelscope import GenerationConfig
261
+
262
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Base"
263
+
264
+ tokenizer = AutoTokenizer.from_pretrained(
265
+ MODEL_NAME_OR_PATH,
266
+ trust_remote_code=True)
267
+
268
+ model = AutoModelForCausalLM.from_pretrained(
269
+ MODEL_NAME_OR_PATH,
270
+ device_map="auto",
271
+ trust_remote_code=True)
272
+
273
+ generation_config = GenerationConfig.from_pretrained(
274
+ MODEL_NAME_OR_PATH,
275
+ trust_remote_code=True)
276
+
277
+ inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
278
+ inputs = inputs.to(model.device)
279
+
280
+ pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
281
+ print("outputs:\n", tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
282
+ ```
283
+
284
+ ### Demonstration of Chat Model Inference
285
+
286
+ ```python
287
+ from modelscope import AutoModelForCausalLM, AutoTokenizer
288
+ from modelscope import GenerationConfig
289
+
290
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Chat-4K"
291
+
292
+ tokenizer = AutoTokenizer.from_pretrained(
293
+ MODEL_NAME_OR_PATH,
294
+ trust_remote_code=True)
295
+
296
+ model = AutoModelForCausalLM.from_pretrained(
297
+ MODEL_NAME_OR_PATH,
298
+ device_map="auto",
299
+ trust_remote_code=True)
300
+
301
+ generation_config = GenerationConfig.from_pretrained(
302
+ MODEL_NAME_OR_PATH,
303
+ trust_remote_code=True)
304
+
305
+ messages = []
306
+ #round-1
307
+ messages.append({"role": "user", "content": "介绍一下刘德华"})
308
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
309
+ messages.append({"role": "assistant", "content": response})
310
+ print(messages)
311
+
312
+ #round-2
313
+ messages.append({"role": "user", "content": "他有什么代表作?"})
314
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
315
+ messages.append({"role": "assistant", "content": response})
316
+ print(messages)
317
+ ```
318
+
319
+ ## CLI Demo
320
+ Use terminal for command-line interface:
321
+
322
+ ```shell
323
+ python cli_demo.py
324
+ ```
325
+ <p align="center">
326
+ <img src="assets/cli_demo.gif" width="600" />
327
+ <p>
328
+
329
+ Note: for Mac users, `device = 'mps'` is not supported yet.
330
+
331
+ ## Web Demo
332
+
333
+ ```shell
334
+ streamlit run web_demo.py
335
+ ```
336
+ <p align="center">
337
+ <img src="assets/web_demo.gif" width="600" />
338
+ <p>
339
+
340
+ ## API Demo
341
+ Launch api:
342
+ ```shell
343
+ python openai_api.py
344
+ ```
345
+
346
+ Then request with parameters:
347
+ ```shell
348
+ curl 'http://localhost:8360/v1/chat/completions' \
349
+ -H 'Content-Type: application/json' \
350
+ -d '{
351
+ "max_new_tokens": 200,
352
+ "do_sample": true,
353
+ "top_k": 0,
354
+ "top_p": 0.8,
355
+ "temperature": 1.0,
356
+ "repetition_penalty": 1.0,
357
+ "messages": [
358
+ {"role": "system", "content": "You are a helpful assistant."},
359
+ {"role": "user", "content": "你好"}
360
+ ]
361
+ }'
362
+ ```
363
+
364
+ <br>
365
+
366
+ # Model Inference
367
+ ## Quantization
368
+ We provide quantization schemes based on AutoGPTQ and release the Int4 quantization models.
369
+
370
+ ## Deployment
371
+ ### vLLM Installation
372
+ We recommend using `vLLM==0.3.3`.
373
+
374
+ If you are using **CUDA 12.1 and PyTorch 2.1**, you can install vLLM directly with:
375
+ ```shell
376
+ pip install vllm==0.3.3
377
+ ```
378
+
379
+ Otherwise, please refer to the official vLLM [Installation Instructions](https://docs.vllm.ai/en/latest/getting_started/installation.html).
380
+
381
+ After installation, perform the following steps:
382
+ 1. Copy `vllm/zhinao.py` into `vllm/model_executor/models` in your vllm installation directory (in python/conda env).
383
+ 2. Copy `vllm/serving_chat.py` into `vllm/entrypoints/openai` in your vllm installation directory.
384
+ 3. Then add a line in `vllm/model_executor/models/__init__.py`
385
+
386
+ ```shell
387
+ "ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),
388
+ ```
389
+
390
+ ### vLLM Service Start
391
+
392
+ Start the service:
393
+ ```shell
394
+ python -m vllm.entrypoints.openai.api_server \
395
+ --served-model-name 360Zhinao2-7B-Chat-4K \
396
+ --model qihoo360/360Zhinao2-7B-Chat-4K \
397
+ --trust-remote-code \
398
+ --tensor-parallel-size 1 \
399
+ --max-model-len 4096 \
400
+ --host 0.0.0.0 \
401
+ --port 8360
402
+ ```
403
+
404
+ Use curl to request the service:
405
+ ```shell
406
+ curl http://localhost:8360/v1/chat/completions \
407
+ -H "Content-Type: application/json" \
408
+ -d '{
409
+ "model": "360Zhinao2-7B-Chat-4K",
410
+ "max_tokens": 200,
411
+ "top_k": -1,
412
+ "top_p": 0.8,
413
+ "temperature": 1.0,
414
+ "presence_penalty": 0.0,
415
+ "frequency_penalty": 0.0,
416
+ "messages": [
417
+ {"role": "system", "content": "You are a helpful assistant."},
418
+ {"role": "user", "content": "你好"}
419
+ ],
420
+ "stop": [
421
+ "<eod>",
422
+ "<|im_end|>",
423
+ "<|im_start|>"
424
+ ]
425
+ }'
426
+ ```
427
+ Use python to request the service:
428
+ ```python
429
+ from openai import OpenAI
430
+ openai_api_key = "EMPTY"
431
+ openai_api_base = "http://localhost:8360/v1"
432
+
433
+ client = OpenAI(
434
+ api_key=openai_api_key,
435
+ base_url=openai_api_base,
436
+ )
437
+
438
+ chat_response = client.chat.completions.create(
439
+ model="360Zhinao2-7B-Chat-4K",
440
+ messages=[
441
+ {"role": "system", "content": "You are a helpful assistant."},
442
+ {"role": "user", "content": "你好"},
443
+ ],
444
+ stop=[
445
+ "<eod>",
446
+ "<|im_end|>",
447
+ "<|im_start|>"
448
+ ],
449
+ presence_penalty=0.0,
450
+ frequency_penalty=0.0
451
+ )
452
+ print("Chat response:", chat_response)
453
+ ```
454
+
455
+ > If you need to enable repetition penalty, we recommend setting `presence_penalty` and `frequency_penalty` instead of `repetition_penalty`.
456
+
457
+
458
+ <br>
459
+
460
+ # Model Finetune
461
+ ## Training data
462
+
463
+ Training Data: `data/training_data_sample.json`. This example data has 10,000 rows sampled from [multiturn_chat_0.8M](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M) with converted format.
464
+
465
+ Data Format:
466
+ ```json
467
+ [
468
+ {
469
+ "id": 1,
470
+ "conversations": [
471
+ {
472
+ "from": "system",
473
+ "value": "You are a helpful assistant."
474
+ },
475
+ {
476
+ "from": "user",
477
+ "value": "您好啊"
478
+ },
479
+ {
480
+ "from": "assistant",
481
+ "value": "你好!我今天能为您做些什么?有什么问题或需要帮助吗? 我在这里为您提供服务。"
482
+ }
483
+ ]
484
+ }
485
+ ]
486
+ ```
487
+ ## Finetuning scripts
488
+ ```shell
489
+ set -x
490
+
491
+ HOSTFILE=hostfile
492
+ DS_CONFIG=./finetune/ds_config_zero2.json
493
+
494
+ # PARAMS
495
+ LR=5e-6
496
+ EPOCHS=3
497
+ MAX_LEN=4096
498
+ BATCH_SIZE=4
499
+ NUM_NODES=1
500
+ NUM_GPUS=8
501
+ MASTER_PORT=29500
502
+
503
+ IS_CONCAT=False # Whether to concatenate to maximum length (MAX_LEN)
504
+
505
+ DATA_PATH="./data/training_data_sample.json"
506
+ MODEL_PATH="qihoo360/360Zhinao2-7B-Base"
507
+ OUTPUT_DIR="./outputs/"
508
+
509
+ deepspeed --hostfile ${HOSTFILE} \
510
+ --master_port ${MASTER_PORT} \
511
+ --num_nodes ${NUM_NODES} \
512
+ --num_gpus ${NUM_GPUS} \
513
+ finetune.py \
514
+ --report_to "tensorboard" \
515
+ --data_path ${DATA_PATH} \
516
+ --model_name_or_path ${MODEL_PATH} \
517
+ --output_dir ${OUTPUT_DIR} \
518
+ --model_max_length ${MAX_LEN} \
519
+ --num_train_epochs ${EPOCHS} \
520
+ --per_device_train_batch_size ${BATCH_SIZE} \
521
+ --gradient_accumulation_steps 1 \
522
+ --save_strategy steps \
523
+ --save_steps 200 \
524
+ --learning_rate ${LR} \
525
+ --lr_scheduler_type cosine \
526
+ --adam_beta1 0.9 \
527
+ --adam_beta2 0.95 \
528
+ --adam_epsilon 1e-8 \
529
+ --max_grad_norm 1.0 \
530
+ --weight_decay 0.1 \
531
+ --warmup_ratio 0.01 \
532
+ --gradient_checkpointing True \
533
+ --bf16 True \
534
+ --tf32 True \
535
+ --deepspeed ${DS_CONFIG} \
536
+ --is_concat ${IS_CONCAT} \
537
+ --logging_steps 1 \
538
+ --log_on_each_node False
539
+ ```
540
+ ```shell
541
+ bash finetune/ds_finetune.sh
542
+ ```
543
+ - Configuring `HOSTFILE` switches between single-machine and multi-machine training.
544
+ - configuring `ds_config` switches between zero1, zero2 and zero3.
545
+ - `fp16, bf16` could configure mixed precision training. bf16 is recommended to be consistent with the pretrained model.
546
+ - `is_concat` configures whether the training data is concatenated or not.
547
+
548
+ <br>
549
+
550
+ # License
551
+
552
+ The source code of this repository follows the open-source license Apache 2.0.
553
+
554
+ 360​Zhinao open-source models support free commercial use. It is not necessary for you to submit a request for commercial usage.
README_CN.md ADDED
@@ -0,0 +1,564 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ library_name: transformers
7
+ tags:
8
+ - qihoo360
9
+ - 奇虎360
10
+ - zhinao
11
+ - 360Zhinao
12
+ - pretrain
13
+ ---
14
+
15
+ <p align="left">
16
+ 中文 | &nbsp <a href="./README.md">English</a></a>&nbsp
17
+ </p>
18
+ <br>
19
+
20
+ <div align="center">
21
+ <h1>
22
+ 360智脑
23
+ </h1>
24
+ </div>
25
+ <div align="center">
26
+ 🤗 <a href="https://huggingface.co/qihoo360">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp
27
+ 🤖 <a href="https://www.modelscope.cn/profile/qihoo360">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp
28
+ 💬 <a href="./assets/WeChat.png">WeChat (微信)</a>&nbsp&nbsp
29
+ </div>
30
+ <br>
31
+ <p align="center">
32
+ 欢迎访问360智脑官网<a href="https://ai.360.com"> https://ai.360.com </a>体验更多更强大的功能。
33
+ </p>
34
+
35
+ <br>
36
+
37
+ # 模型介绍
38
+ 🎉🎉🎉我们开源了360智脑大模型的系列工作,本次开源了以下模型:
39
+ - **360Zhinao2-7B-Base**
40
+ - **360Zhinao2-7B-Chat-4K**
41
+ - **360Zhinao2-7B-Chat-32K**
42
+ - **360Zhinao2-7B-Chat-360K**
43
+
44
+ 360智脑大模型特点如下:
45
+ - **基础模型**:采⽤当前主流的两阶段训练⽅法,第⼀阶段采用cosine学习率总共训练10T
46
+ token,第二阶段我们加⼤了⾼质量数据的占⽐,训练了100B⾼质量token,学习率LR直接decay到0。**360Zhinao2-7B总共训练数据量达10.1T token**。
47
+ - **对话模型**:具有强大的对话能力,开放4K、32K、360K三种不同文本长度。
48
+
49
+ <br>
50
+
51
+ # 更新信息
52
+ - [2024.11.18] 🔥🔥🔥我们发布了360Zhinao2-7B,同时开放Base模型和4K、32K、360K三种文本长度的Chat模型。
53
+ - [2024.05.23] 我们发布了360Zhinao-search以及360Zhinao-1.8B-Reranking两个模型,分别在[C-MTEB 榜单](https://huggingface.co/spaces/mteb/leaderboard)的Retrieval和Reranking任务上排名第一。
54
+ - [2024.05.20] 我们将llama3的窗口长度扩展到360k并发布了**llama3-8B-360Zhinao-360k-Instruct**<a href="https://huggingface.co/qihoo360/llama3-8B-360Zhinao-360k-Instruct">🤗</a>
55
+ - [2024.04.12] 我们发布了360Zhinao-7B 1.0版本,同时开放Base模型和4K、32K、360K三种文本长度的Chat模型。
56
+ 技术报告详见[arXiv](https://arxiv.org/abs/2405.13386)。
57
+
58
+ <br>
59
+
60
+ # 目录
61
+ - [下载地址](#下载地址)
62
+ - [模型评估](#模型评估)
63
+ - [快速开始](#快速开始)
64
+ - [模型推理](#模型推理)
65
+ - [模型微调](#模型微调)
66
+ - [许可证](#许可证)
67
+
68
+ <br>
69
+
70
+ # 下载地址
71
+ 本次发布版本和下载链接见下表:
72
+ | Size | Model | BF16 | Int4|
73
+ |:-:|-|:-:|:-:|
74
+ | 7B | 360Zhinao2-7B-Base | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Base/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Base">🤗</a> | |
75
+ | 7B | 360Zhinao2-7B-Chat-4K | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-4K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-4K">🤗</a> | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-4K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-4K-Int4">🤗</a> |
76
+ | 7B | 360Zhinao2-7B-Chat-32K | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-32K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-32K">🤗</a> | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-32K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-32K-Int4">🤗</a> |
77
+ | 7B | 360Zhinao2-7B-Chat-360K | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-360K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-360K">🤗</a> | <a href="https://www.modelscope.cn/models/qihoo360/360Zhinao2-7B-Chat-360K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-360K-Int4">🤗</a> |
78
+
79
+ <br>
80
+
81
+ # 模型评估
82
+ 我们使⽤了开源⼯具opencompass对模型进⾏评估,对⽐了近半年国内外开源的10B以下模型,
83
+ 360Zhinao2-7B具备较强的竞争⼒。360Zhinao2-7B在CEval(中⽂
84
+ 考试)、C3(中⽂阅读理解)、lcsts(中⽂短⽂本摘要)等中⽂benchmark上表现不俗,中⽂
85
+ benchmark均分排名第⼀。在挑战性的竞赛数学数据集math上,同样排名第⼀。**360Zhinao2-7B模
86
+ 型在中⽂处理能⼒、复杂数学推理能⼒两个⽅⾯,具备优势。**
87
+
88
+ <table>
89
+ <tr>
90
+ <td>Type</td><td>Datasets</td><td>language</td><td>glm4-9b</td><td>Qwen2.5-7B</td><td>internlm2.5-7b</td><td>Yi1.5-9B</td><td>gemma2-9b</td><td>Llama3.1-8B</td><td>360Zhinao2-7B</td>
91
+ </tr>
92
+ <tr>
93
+ <td rowspan="5">Exam</td><td>ceval</td><td>zh</td><td>75.83</td><td>81.41</td><td>77.71</td><td>73.51</td><td>56.36</td><td>51.67</td><td><strong>83.04</strong></td>
94
+ </tr>
95
+ <tr>
96
+ <td>mmlu</td><td>en</td><td>75.5</td><td>75.5</td><td>71.55</td><td>71.43</td><td>72.22</td><td>66.75</td><td>67.84</td>
97
+ </tr>
98
+ <tr>
99
+ <td>cmmlu</td><td>zh</td><td>74.24</td><td>81.79</td><td>78.77</td><td>74.2</td><td>58.89</td><td>52.49</td><td>73.8</td>
100
+ </tr>
101
+ <tr>
102
+ <td>ARC-c</td><td>en</td><td>94.92</td><td>80</td><td>85.08</td><td>87.46</td><td>77.63</td><td>80.68</td><td>87.12</td>
103
+ </tr>
104
+ <tr>
105
+ <td>ARC-e</td><td>en</td><td>98.41</td><td>84.83</td><td>95.24</td><td>94.53</td><td>78.84</td><td>89.77</td><td>92.77</td>
106
+ </tr>
107
+ <tr>
108
+ <td rowspan="2">Language</td><td>WiC</td><td>en</td><td>51.57</td><td>52.82</td><td>50.78</td><td>50.63</td><td>50.47</td><td>50</td><td>49.84</td>
109
+ </tr>
110
+ <tr>
111
+ <td>WSC</td><td>en</td><td>68.27</td><td>68.27</td><td>69.23</td><td>66.35</td><td>68.27</td><td>67.31</td><td>65.38</td>
112
+ </tr>
113
+ <tr>
114
+ <td rowspan="2">Knowledge</td>
115
+ <td>BoolQ</td><td>en</td><td>81.8</td><td>83.88</td><td>89.51</td><td>84.46</td><td>85.6</td><td>82.2</td><td>88.29</td>
116
+ </tr>
117
+ <tr>
118
+ <td>commonsense_qa</td><td>en</td><td>71.17</td><td>73.22</td><td>68.55</td><td>71.58</td><td>68.47</td><td>71.25</td><td>69.78</td>
119
+ </tr>
120
+ <tr>
121
+ <td rowspan="6">Understanding</td>
122
+ <td>C3</td><td>zh</td><td>91.51</td><td>92</td><td>93.04</td><td>85.86</td><td>81.64</td><td>83.51</td><td><strong>93.26</strong></td>
123
+ </tr>
124
+ <tr>
125
+ <td>race-middle</td><td>en</td><td>91.99</td><td>91.02</td><td>92.06</td><td>91.16</td><td>88.09</td><td>81.69</td><td>90.46</td>
126
+ </tr>
127
+ <tr>
128
+ <td>race-high</td><td>en</td><td>90.71</td><td>87.91</td><td>90.08</td><td>88.34</td><td>82.08</td><td>78.73</td><td>86.74</td>
129
+ </tr>
130
+ <tr>
131
+ <td>lcsts</td><td>zh</td><td>18.29</td><td>15.82</td><td>15.96</td><td>16.49</td><td>10.62</td><td>17.29</td><td><strong>18.61</strong></td>
132
+ </tr>
133
+ <tr>
134
+ <td>eprstmt-dev</td><td>zh</td><td>91.88</td><td>86.88</td><td>91.25</td><td>91.88</td><td>48.12</td><td>83.12</td><td>90</td>
135
+ </tr>
136
+ <tr>
137
+ <td>lambada</td><td>en</td><td>71.67</td><td>71.14</td><td>69.98</td><td>70.64</td><td>75.43</td><td>74.23</td><td>72.56</td>
138
+ </tr>
139
+ <tr>
140
+ <td rowspan="3">Reasoning</td>
141
+ <td>hellaswag</td><td>en</td><td>70.25</td><td>72.76</td><td>70.38</td><td>71.55</td><td>66.83</td><td>74.65</td><td>71.49</td>
142
+ </tr>
143
+ <tr>
144
+ <td>siqa</td><td>en</td><td>81.73</td><td>72.52</td><td>78.97</td><td>76.2</td><td>58.96</td><td>64.18</td><td>77.12</td>
145
+ </tr>
146
+ <tr>
147
+ <td>bbh</td><td>en</td><td>73.68</td><td>54.63</td><td>59.43</td><td>67.86</td><td>68.45</td><td>59.9</td><td>46.54</td>
148
+ </tr>
149
+ <tr>
150
+ <td rowspan="2">Code</td>
151
+ <td>humaneval</td><td>en</td><td>69.51</td><td>75</td><td>60.37</td><td>26.22</td><td>5.49</td><td>27.44</td><td>60.98</td>
152
+ </tr>
153
+ <tr>
154
+ <td>mbpp</td><td>en</td><td>60</td><td>60</td><td>43.6</td><td>56.8</td><td>51.2</td><td>42.6</td><td>54</td>
155
+ </tr>
156
+ <tr>
157
+ <td rowspan="2">Math</td>
158
+ <td>math</td><td>en</td><td>26.86</td><td>38</td><td>27.14</td><td>27.06</td><td>28.52</td><td>15.32</td><td><strong>38.34</strong></td>
159
+ </tr>
160
+ <tr>
161
+ <td>gsm8k</td><td>en</td><td>78.54</td><td>79.76</td><td>52.54</td><td>71.11</td><td>73.09</td><td>56.25</td><td>75.51</td>
162
+ </tr>
163
+ <tr>
164
+ <td rowspan="2">Overall</td>
165
+ <td>avg_zh</td><td></td><td>70.35</td><td>71.58</td><td>71.35</td><td>68.39</td><td>51.13</td><td>57.62</td><td><strong>71.74</strong></td>
166
+ </tr>
167
+ <tr>
168
+ <td>avg_all</td><td></td><td>73.11</td><td>71.78</td><td>69.60</td><td>68.88</td><td>61.60</td><td>62.32</td><td>70.61</td>
169
+ </tr>
170
+ </table>
171
+
172
+ ## 基础模型
173
+
174
+ # 快速开始
175
+ 简单的示例来说明如何利用🤖 ModelScope和🤗 Transformers快速使用360Zhinao2-7B-Base和360Zhinao2-7B-Chat
176
+
177
+ ## 依赖安装
178
+ - python 3.8 and above
179
+ - pytorch 2.0 and above
180
+ - transformers 4.37.2 and above
181
+ - CUDA 11.4 and above are recommended.
182
+
183
+ ```shell
184
+ pip install -r requirements.txt
185
+ ```
186
+ 我们推荐安装flash-attention(当前已支持flash attention 2)来提高你的运行效率以及降低显存占用。(flash-attention只是可选项,不安装也可正常运行该项目)
187
+
188
+ >flash-attn >= 2.3.6
189
+ ```shell
190
+ FLASH_ATTENTION_FORCE_BUILD=TRUE pip install flash-attn==2.3.6
191
+ ```
192
+
193
+
194
+ ## 🤗 Transformers
195
+ ### Base模型推理
196
+
197
+ 此代码演示使用transformers快速使用360Zhinao2-7B-Base模型进行推理
198
+ ```python
199
+ from transformers import AutoTokenizer, AutoModelForCausalLM
200
+ from transformers.generation import GenerationConfig
201
+
202
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Base"
203
+
204
+ tokenizer = AutoTokenizer.from_pretrained(
205
+ MODEL_NAME_OR_PATH,
206
+ trust_remote_code=True)
207
+
208
+ model = AutoModelForCausalLM.from_pretrained(
209
+ MODEL_NAME_OR_PATH,
210
+ device_map="auto",
211
+ trust_remote_code=True)
212
+
213
+ generation_config = GenerationConfig.from_pretrained(
214
+ MODEL_NAME_OR_PATH,
215
+ trust_remote_code=True)
216
+
217
+ inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
218
+ inputs = inputs.to(model.device)
219
+
220
+ pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
221
+ print("outputs:\n", tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
222
+ ```
223
+
224
+ ### Chat模型推理
225
+
226
+ 此代码演示使用transformers快速使用360Zhinao2-7B-Chat-4K模型进行推理
227
+ ```python
228
+ from transformers import AutoTokenizer, AutoModelForCausalLM
229
+ from transformers.generation import GenerationConfig
230
+
231
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Chat-4K"
232
+
233
+ tokenizer = AutoTokenizer.from_pretrained(
234
+ MODEL_NAME_OR_PATH,
235
+ trust_remote_code=True)
236
+
237
+ model = AutoModelForCausalLM.from_pretrained(
238
+ MODEL_NAME_OR_PATH,
239
+ device_map="auto",
240
+ trust_remote_code=True)
241
+
242
+ generation_config = GenerationConfig.from_pretrained(
243
+ MODEL_NAME_OR_PATH,
244
+ trust_remote_code=True)
245
+
246
+ messages = []
247
+ #round-1
248
+ messages.append({"role": "user", "content": "介绍一下刘德华"})
249
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
250
+ messages.append({"role": "assistant", "content": response})
251
+ print(messages)
252
+
253
+ #round-2
254
+ messages.append({"role": "user", "content": "他有什么代表作?"})
255
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
256
+ messages.append({"role": "assistant", "content": response})
257
+ print(messages)
258
+ ```
259
+
260
+ ## 🤖 ModelScope
261
+ ### Base模型推理
262
+
263
+ 此代码演示使用ModelScope快速使用360Zhinao2-7B-Base模型进行推理
264
+
265
+
266
+ ```python
267
+ from modelscope import AutoModelForCausalLM, AutoTokenizer
268
+ from modelscope import GenerationConfig
269
+
270
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Base"
271
+
272
+ tokenizer = AutoTokenizer.from_pretrained(
273
+ MODEL_NAME_OR_PATH,
274
+ trust_remote_code=True)
275
+
276
+ model = AutoModelForCausalLM.from_pretrained(
277
+ MODEL_NAME_OR_PATH,
278
+ device_map="auto",
279
+ trust_remote_code=True)
280
+
281
+ generation_config = GenerationConfig.from_pretrained(
282
+ MODEL_NAME_OR_PATH,
283
+ trust_remote_code=True)
284
+
285
+ inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
286
+ inputs = inputs.to(model.device)
287
+
288
+ pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
289
+ print("outputs:\n", tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
290
+ ```
291
+
292
+ ### Chat模型推理
293
+
294
+ 此代码演示使用ModelScope快速使用360Zhinao2-7B-Chat-4K模型进行推理
295
+ ```python
296
+ from modelscope import AutoModelForCausalLM, AutoTokenizer
297
+ from modelscope import GenerationConfig
298
+
299
+ MODEL_NAME_OR_PATH = "qihoo360/360Zhinao2-7B-Chat-4K"
300
+
301
+ tokenizer = AutoTokenizer.from_pretrained(
302
+ MODEL_NAME_OR_PATH,
303
+ trust_remote_code=True)
304
+
305
+ model = AutoModelForCausalLM.from_pretrained(
306
+ MODEL_NAME_OR_PATH,
307
+ device_map="auto",
308
+ trust_remote_code=True)
309
+
310
+ generation_config = GenerationConfig.from_pretrained(
311
+ MODEL_NAME_OR_PATH,
312
+ trust_remote_code=True)
313
+
314
+ messages = []
315
+ #round-1
316
+ messages.append({"role": "user", "content": "介绍一下刘德华"})
317
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
318
+ messages.append({"role": "assistant", "content": response})
319
+ print(messages)
320
+
321
+ #round-2
322
+ messages.append({"role": "user", "content": "他有什么代表作?"})
323
+ response = model.chat(tokenizer=tokenizer, messages=messages, generation_config=generation_config)
324
+ messages.append({"role": "assistant", "content": response})
325
+ print(messages)
326
+ ```
327
+
328
+ ## 终端 Demo
329
+ 可使用终端交互实现快速体验
330
+ ```shell
331
+ python cli_demo.py
332
+ ```
333
+ <p align="center">
334
+ <img src="assets/cli_demo.gif" width="600" />
335
+ <p>
336
+
337
+ 注:我们尚未支持Mac上`device = 'mps'`。
338
+
339
+ ## 网页 Demo
340
+ 也可使用网页交互实现快速体验
341
+ ```shell
342
+ streamlit run web_demo.py
343
+ ```
344
+ <p align="center">
345
+ <img src="assets/web_demo.gif" width="600" />
346
+ <p>
347
+
348
+ ## API Demo
349
+ 启动命令
350
+ ```shell
351
+ python openai_api.py
352
+ ```
353
+
354
+ 请求参数
355
+ ```shell
356
+ curl 'http://localhost:8360/v1/chat/completions' \
357
+ -H 'Content-Type: application/json' \
358
+ -d '{
359
+ "max_new_tokens": 200,
360
+ "do_sample": true,
361
+ "top_k": 0,
362
+ "top_p": 0.8,
363
+ "temperature": 1.0,
364
+ "repetition_penalty": 1.0,
365
+ "messages": [
366
+ {"role": "system", "content": "You are a helpful assistant."},
367
+ {"role": "user", "content": "你好"}
368
+ ]
369
+ }'
370
+ ```
371
+
372
+ <br>
373
+
374
+ # 模型推理
375
+ ## 模型量化
376
+ 我们提供了基于AutoGPTQ的量化方案,并开源了Int4量化模型。
377
+
378
+ ## 模型部署
379
+ ### vLLM安装环境
380
+ 如希望部署及加速推理,我们建议你使用 `vLLM==0.3.3`。
381
+
382
+ 如果你使用**CUDA 12.1和PyTorch 2.1**,可以直接使用以下命令安装vLLM。
383
+ ```shell
384
+ pip install vllm==0.3.3
385
+ ```
386
+
387
+ 否则请参考vLLM官方的[安装说明](https://docs.vllm.ai/en/latest/getting_started/installation.html)。
388
+
389
+ >安装完成后,还需要以下操作~
390
+ 1. 把vllm/zhinao.py文件复制到env环境对应的vllm/model_executor/models目录下。
391
+ 2. 把vllm/serving_chat.py文件复制到env环境对应的vllm/entrypoints/openai目录下。
392
+ 3. 然后在vllm/model_executor/models/\_\_init\_\_.py文件增加一行代码
393
+
394
+ ```shell
395
+ "ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),
396
+ ```
397
+
398
+ ### vLLM服务启动
399
+
400
+ 启动服务
401
+ ```shell
402
+ python -m vllm.entrypoints.openai.api_server \
403
+ --served-model-name 360Zhinao2-7B-Chat-4K \
404
+ --model qihoo360/360Zhinao2-7B-Chat-4K \
405
+ --trust-remote-code \
406
+ --tensor-parallel-size 1 \
407
+ --max-model-len 4096 \
408
+ --host 0.0.0.0 \
409
+ --port 8360
410
+ ```
411
+
412
+ 使用curl请求服务
413
+ ```shell
414
+ curl http://localhost:8360/v1/chat/completions \
415
+ -H "Content-Type: application/json" \
416
+ -d '{
417
+ "model": "360Zhinao2-7B-Chat-4K",
418
+ "max_tokens": 200,
419
+ "top_k": -1,
420
+ "top_p": 0.8,
421
+ "temperature": 1.0,
422
+ "presence_penalty": 0.0,
423
+ "frequency_penalty": 0.0,
424
+ "messages": [
425
+ {"role": "system", "content": "You are a helpful assistant."},
426
+ {"role": "user", "content": "你好"}
427
+ ],
428
+ "stop": [
429
+ "<eod>",
430
+ "<|im_end|>",
431
+ "<|im_start|>"
432
+ ]
433
+ }'
434
+ ```
435
+ 使用python请求服务
436
+ ```python
437
+ from openai import OpenAI
438
+ # Set OpenAI's API key and API base to use vLLM's API server.
439
+ openai_api_key = "EMPTY"
440
+ openai_api_base = "http://localhost:8360/v1"
441
+
442
+ client = OpenAI(
443
+ api_key=openai_api_key,
444
+ base_url=openai_api_base,
445
+ )
446
+
447
+ chat_response = client.chat.completions.create(
448
+ model="360Zhinao2-7B-Chat-4K",
449
+ messages=[
450
+ {"role": "system", "content": "You are a helpful assistant."},
451
+ {"role": "user", "content": "你好"},
452
+ ],
453
+ stop=[
454
+ "<eod>",
455
+ "<|im_end|>",
456
+ "<|im_start|>"
457
+ ],
458
+ presence_penalty=0.0,
459
+ frequency_penalty=0.0
460
+ )
461
+ print("Chat response:", chat_response)
462
+ ```
463
+
464
+ > 注意:如需要开启重复惩罚,建议使用 *presence_penalty* 和 *frequency_penalty* 参数。
465
+
466
+ <br>
467
+
468
+ # 模型微调
469
+ ## 训练数据
470
+
471
+ 我们提供了微调训练样例数据 data/test.json,该样例数据是从 [multiturn_chat_0.8M](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M) 采样出 1 万条,并且做了格式转换。
472
+
473
+ 数据格式:
474
+ ```json
475
+ [
476
+ {
477
+ "id": 1,
478
+ "conversations": [
479
+ {
480
+ "from": "system",
481
+ "value": "You are a helpful assistant."
482
+ },
483
+ {
484
+ "from": "user",
485
+ "value": "您好啊"
486
+ },
487
+ {
488
+ "from": "assistant",
489
+ "value": "你好!我今天能为您做些什么?有什么问题或需要帮助吗? 我在这里为您提供服务。"
490
+ }
491
+ ]
492
+ }
493
+ ]
494
+ ```
495
+
496
+ ## 微调训练
497
+ 训练脚本如下:
498
+ ```shell
499
+ set -x
500
+
501
+ HOSTFILE=hostfile
502
+ DS_CONFIG=./finetune/ds_config_zero2.json
503
+
504
+ # PARAMS
505
+ LR=5e-6
506
+ EPOCHS=3
507
+ MAX_LEN=4096
508
+ BATCH_SIZE=4
509
+ NUM_NODES=1
510
+ NUM_GPUS=8
511
+ MASTER_PORT=29500
512
+
513
+ IS_CONCAT=False # 是否数据拼接到最大长度(MAX_LEN)
514
+
515
+ DATA_PATH="./data/training_data_sample.json"
516
+ MODEL_PATH="qihoo360/360Zhinao2-7B-Base"
517
+ OUTPUT_DIR="./outputs/"
518
+
519
+ deepspeed --hostfile ${HOSTFILE} \
520
+ --master_port ${MASTER_PORT} \
521
+ --num_nodes ${NUM_NODES} \
522
+ --num_gpus ${NUM_GPUS} \
523
+ finetune.py \
524
+ --report_to "tensorboard" \
525
+ --data_path ${DATA_PATH} \
526
+ --model_name_or_path ${MODEL_PATH} \
527
+ --output_dir ${OUTPUT_DIR} \
528
+ --model_max_length ${MAX_LEN} \
529
+ --num_train_epochs ${EPOCHS} \
530
+ --per_device_train_batch_size ${BATCH_SIZE} \
531
+ --gradient_accumulation_steps 1 \
532
+ --save_strategy steps \
533
+ --save_steps 200 \
534
+ --learning_rate ${LR} \
535
+ --lr_scheduler_type cosine \
536
+ --adam_beta1 0.9 \
537
+ --adam_beta2 0.95 \
538
+ --adam_epsilon 1e-8 \
539
+ --max_grad_norm 1.0 \
540
+ --weight_decay 0.1 \
541
+ --warmup_ratio 0.01 \
542
+ --gradient_checkpointing True \
543
+ --bf16 True \
544
+ --tf32 True \
545
+ --deepspeed ${DS_CONFIG} \
546
+ --is_concat ${IS_CONCAT} \
547
+ --logging_steps 1 \
548
+ --log_on_each_node False
549
+ ```
550
+ ```shell
551
+ bash finetune/ds_finetune.sh
552
+ ```
553
+ - 可通过配置hostfile,实现单机、多机训练。
554
+ - 可通过配置ds_config,实现zero2、zero3。
555
+ - 可通过配置fp16、bf16实现混合精度训练,建议使用bf16,与预训练模型保持一致。
556
+ - 可通过配置is_concat参数,控制训练数据是否拼接,当训练数据量级较大时,可通过拼接提升训练效率。
557
+
558
+ <br>
559
+
560
+ # 许可证
561
+
562
+ 本仓库源码遵循开源许可证Apache 2.0。
563
+
564
+ 360智脑开源模型支持免费商用,无需向我们进行特殊申请。