metadata
library_name: transformers
license: apache-2.0
datasets:
- monology/pile-uncopyrighted
- MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5
language:
- en
metrics:
- accuracy
pipeline_tag: text-generation
MinPLM-QWen-200M
MiniPLM-QWen-200M is a 200M model with QWen achitecture pre-trained from scratch on the Pile using the MiniPLM knowledge distillation framework with the offcial QWen1.5-1.8B as the teacher model.
We also open-source the pre-training corpus refined by Difference Sampling in MiniPLM for reproducibility.
Evaluation
MiniPLM models achieves better performance given the same computation and scales well across model sizes:
Baseline Models
Citation
@misc{gu2024miniplmknowledgedistillationpretraining,
title={MiniPLM: Knowledge Distillation for Pre-Training Language Models},
author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
year={2024},
eprint={2410.17215},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.17215},
}