|
--- |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
pipeline_tag: text-generation |
|
--- |
|
# 4x1.8B MoE Qwen Ckpt 18000 |
|
|
|
This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods. |
|
|
|
This model is a checkpoint model for the continue pretraining stage. |
|
|
|
![](loss_plot.png) |
|
|
|
# Evaluations |
|
|
|
| Groups | Metric |Value | |Stderr| |
|
|-----------|--------|-----:|---|-----:| |
|
|boolq |acc |0.6502|± |0.0083| |
|
|ceval-valid|acc |0.5171|± |0.1872| |
|
| |acc_norm|0.5171|± |0.1872| |
|
|cmmlu |acc |0.5041|± |0.1222| |
|
| |acc_norm|0.5041|± |0.1222| |
|
|mathqa |acc |0.2693|± |0.0081| |
|
| |acc_norm|0.2693|± |0.0081| |
|
|
|
# Acknowledgements |
|
|
|
+ [Qwen](https://github.com/QwenLM/Qwen) |
|
+ [mistral.ai](https://mistral.ai) |
|
|
|
# License Agreement |
|
|
|
This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [LICENSE](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT). |
|
|
|
During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement. |