chestnutlzj
/

MoE-Qwen-4x1.8B-pretrain-18000-ckpt

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

MoE-Qwen-4x1.8B-pretrain-18000-ckpt / README.md

chestnutlzj's picture

Update README.md

f922794 12 months ago

|

history blame contribute delete

1.25 kB

	---
	license: apache-2.0
	language:
	- zh
	pipeline_tag: text-generation
	---
	# 4x1.8B MoE Qwen Ckpt 18000

	This is a MoE model project constructed based on the Qwen 1.8B model. In this project, we concatenated 4 original models and trained them using special training methods.

	This model is a checkpoint model for the continue pretraining stage.

	![](loss_plot.png)

	# Evaluations

	\| Groups \| Metric \|Value \| \|Stderr\|
	\|-----------\|--------\|-----:\|---\|-----:\|
	\|boolq \|acc \|0.6502\|± \|0.0083\|
	\|ceval-valid\|acc \|0.5171\|± \|0.1872\|
	\| \|acc_norm\|0.5171\|± \|0.1872\|
	\|cmmlu \|acc \|0.5041\|± \|0.1222\|
	\| \|acc_norm\|0.5041\|± \|0.1222\|
	\|mathqa \|acc \|0.2693\|± \|0.0081\|
	\| \|acc_norm\|0.2693\|± \|0.0081\|

	# Acknowledgements

	+ [Qwen](https://github.com/QwenLM/Qwen)
	+ [mistral.ai](https://mistral.ai)

	# License Agreement

	This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [LICENSE](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT).

	During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.