4x1.8B MoE Qwen Ckpt 18000

This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.

This model is a checkpoint model for the continue pretraining stage.

Evaluations

Groups Metric Value Stderr
boolq acc 0.6502 ± 0.0083
ceval-valid acc 0.5171 ± 0.1872
acc_norm 0.5171 ± 0.1872
cmmlu acc 0.5041 ± 0.1222
acc_norm 0.5041 ± 0.1222
mathqa acc 0.2693 ± 0.0081
acc_norm 0.2693 ± 0.0081

Acknowledgements

License Agreement

This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: LICENSE.

During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.

Downloads last month
22
Safetensors
Model size
4.27B params
Tensor type
F32
·
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.