Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ tags:
|
|
24 |
</div>
|
25 |
<div align="center">
|
26 |
🤗 <a href="https://huggingface.co/qihoo360">HuggingFace</a>   |   
|
27 |
-
🤖 <a href="https://
|
28 |
💬 <a href="./assets/WeChat.png">WeChat (微信)</a>  
|
29 |
</div>
|
30 |
<br>
|
@@ -71,10 +71,10 @@ Technical report is on [arXiv](https://arxiv.org/abs/2405.13386).
|
|
71 |
|
72 |
| Size | Model | BF16 | Int4|
|
73 |
|-|-|-|-|
|
74 |
-
| 7B | 360Zhinao2-7B-Base | <a href="https://
|
75 |
-
| 7B | 360Zhinao2-7B-Chat-4K | <a href="https://
|
76 |
-
| 7B | 360Zhinao2-7B-Chat-32K | <a href="https://
|
77 |
-
| 7B | 360Zhinao2-7B-Chat-360K | <a href="https://
|
78 |
|
79 |
<br>
|
80 |
|
@@ -169,6 +169,60 @@ We used the open-source tool OpenCompass to evaluate the model and compared it w
|
|
169 |
|
170 |
<br>
|
171 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
172 |
# Quickstart
|
173 |
We provide simple examples illustrating the use of 360Zhinao2-7B-Base and 360Zhinao2-7B-Chat on 🤖ModelScope and 🤗Transformers.
|
174 |
|
|
|
24 |
</div>
|
25 |
<div align="center">
|
26 |
🤗 <a href="https://huggingface.co/qihoo360">HuggingFace</a>   |   
|
27 |
+
🤖 <a href="https://modelscope.cn/organization/360zhinao">ModelScope</a>   |   
|
28 |
💬 <a href="./assets/WeChat.png">WeChat (微信)</a>  
|
29 |
</div>
|
30 |
<br>
|
|
|
71 |
|
72 |
| Size | Model | BF16 | Int4|
|
73 |
|-|-|-|-|
|
74 |
+
| 7B | 360Zhinao2-7B-Base | <a href="https://modelscope.cn/models/360zhinao/360Zhinao2-7B-Base/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Base">🤗</a> | |
|
75 |
+
| 7B | 360Zhinao2-7B-Chat-4K | <a href="https://modelscope.cn/models/360zhinao/360Zhinao2-7B-Chat-4K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-4K">🤗</a> | <a href="https://modelscope.cn/models/360zhinao/360Zhinao2-7B-Chat-4K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-4K-Int4">🤗</a> |
|
76 |
+
| 7B | 360Zhinao2-7B-Chat-32K | <a href="https://modelscope.cn/models/360zhinao/360Zhinao2-7B-Chat-32K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-32K">🤗</a> | <a href="https://modelscope.cn/models/360zhinao/360Zhinao2-7B-Chat-32K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-32K-Int4">🤗</a> |
|
77 |
+
| 7B | 360Zhinao2-7B-Chat-360K | <a href="https://modelscope.cn/models/360zhinao/360Zhinao2-7B-Chat-360K/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-360K">🤗</a> | <a href="https://modelscope.cn/models/360zhinao/360Zhinao2-7B-Chat-360K-Int4/summary">🤖</a> <a href="https://huggingface.co/qihoo360/360Zhinao2-7B-Chat-360K-Int4">🤗</a> |
|
78 |
|
79 |
<br>
|
80 |
|
|
|
169 |
|
170 |
<br>
|
171 |
|
172 |
+
|
173 |
+
## Chat Model
|
174 |
+
|
175 |
+
### Post-Training Data
|
176 |
+
360's proprietary general fine-tuning dataset consists of 500,000 samples. This dataset considers various skills and 360's vertical business data, with the following generation methods:
|
177 |
+
|
178 |
+
1. **Data Diversity**: Layered sampling based on 360's proprietary tagging system, considering domain, intent, difficulty, and length to ensure instruction diversity.
|
179 |
+
2. **Data Quality**: Using open-source data and proprietary preference-ordered data to train 360gpt-pro-rm (reward benchmark score of 92.59). This model is used for sample screening to filter out low-quality responses.
|
180 |
+
3. **Complex Instruction Evolution**: Optimizing complex instructions through evolutionary methods to enhance instruction-following capabilities.
|
181 |
+
|
182 |
+
### Training Methods
|
183 |
+
1. **Full Parameter Fine-Tuning**
|
184 |
+
|
185 |
+
Based on the general post-training data, full parameter fine-tuning is performed, and the optimal checkpoint is selected as sft-base.
|
186 |
+
|
187 |
+
2. **LoRA Off-policy DPO**
|
188 |
+
|
189 |
+
Using human-labeled preference pairs, LoRA fine-tuning is applied to the sft-base model, followed by LoRA DPO training.
|
190 |
+
|
191 |
+
3. **Iterative On-Policy DPO**
|
192 |
+
|
193 |
+
The sft-base model samples multiple answers on training prompts, and 360gpt-pro-rm scores them. The highest and lowest scoring answers form each pair, and such pairs are used for DPO training. This on-policy DPO method is iteratively used to improve model performance.
|
194 |
+
|
195 |
+
4. **Model Merging**
|
196 |
+
|
197 |
+
Automatic evaluations on 360's white-box evaluation set v4 revealed that different models excel in different skills. A model merging scheme was considered. Using the sft model as the base, interpolation with model v1 was performed, followed by extrapolation between the sft model and model v1 with an extrapolation coefficient of 0.2. This resulted in the final 360Zhicao2-7B-Chat-4k model.
|
198 |
+
|
199 |
+
### Model Performance
|
200 |
+
We evaluated the 360Zhicao2-7B-Chat-4k model on several classic tasks. The IFEval (prompt strict) score was second only to GLM4-9B, making it the highest among open-source 7B models. It ranked third on MT-bench, slightly behind Qwen2.5-7B, and second among 7B models. It placed third on CF-Bench, and for PSR, it was second only to GLM4-9B. Detailed results are shown in the table below:
|
201 |
+
|
202 |
+
| Model | MT-bench | IFEval(strict prompt) | CFBench(CSR,ISR,PSR) | | |
|
203 |
+
|----------------------|----------|-----------------------|----------------------|------|------|
|
204 |
+
| Qwen2.5-7B-Instruct | **8.07** | 0.556 | **0.81** | 0.46 | 0.57 |
|
205 |
+
| Yi-9B-16k-Chat | 7.44 | 0.455 | 0.75 | 0.4 | 0.52 |
|
206 |
+
| GLM4-9B-Chat | **8.08** | **0.634** | **0.82** | 0.48 | 0.61 |
|
207 |
+
| InternLM2.5-7B-Chat | 7.39 | 0.540 | 0.78 | 0.4 | 0.54 |
|
208 |
+
| 360Zhicao2-7B-Chat-4k| 7.86 | **0.577** | 0.8 | 0.44 | 0.57 |
|
209 |
+
|
210 |
+
### Long Context Fine-Tuning
|
211 |
+
Similar to the method used during the open-sourcing of 360Zhinao1, we expanded the RoPE base to 1,000,000 and 50,000,000, sequentially concatenated SFT data of mixed long and short texts to 32k and 360k. By combining techniques like gradient checkpointing, ZeRO3 offload, and ring attention, we fine-tuned models to achieve 32k and 360k long context capabilities. These models ranked in the top tier across various 32k benchmarks.
|
212 |
+
|
213 |
+
| Model | LooGLE-Long Dependency QA | Loong-Set 1 (32k) | LongBench-Chat (32k cutoff) | LEval-96 question subset | LEval-closed ended |
|
214 |
+
|------------------------------|-----------------|-------------------|--------------------------|--------------------|------------------|
|
215 |
+
| GLM4-9B-Chat | 0.36 | 55.24 | 6.60 | 0.49 | 63.96 |
|
216 |
+
| InternLM2.5-7B-Chat | 0.39 | 42.76 | 5.70 | 0.44 | 61.64 |
|
217 |
+
| 360Zhinao2-7B-Chat-32k | 0.33 | 39.37 | 5.44 | 0.44 | 60.48 |
|
218 |
+
| 360Zhinao2-7B-Chat-360k | 0.34 | 32.16 | 5.08 | 0.38 | 53.00 |
|
219 |
+
| Yi-1.5-9B-Chat | 0.25 | 32.77 | 4.70 | 0.37 | 56.22 |
|
220 |
+
|
221 |
+
<br>
|
222 |
+
|
223 |
+
|
224 |
+
|
225 |
+
|
226 |
# Quickstart
|
227 |
We provide simple examples illustrating the use of 360Zhinao2-7B-Base and 360Zhinao2-7B-Chat on 🤖ModelScope and 🤗Transformers.
|
228 |
|