Edit model card

Pretrain-Qwen-500M

paper | code

Pretrain-Qwen-500M is a 500M model with QWen achitecture conventionally pre-trained from scratch on the Pile for 50B tokens.

We also open-source the tokenized pre-training corpus for reproducibility.

It is used as the baseline for MiniLLM-Qwen-500M

Evaluation

MiniPLM models achieves better performance given the same computation and scales well across model sizes:

Other Baselines

Citation

TODO

Downloads last month
4
Safetensors
Model size
464M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train MiniLLM/Pretrain-Qwen-500M