metadata

library_name: transformers
license: apache-2.0
datasets:
  - monology/pile-uncopyrighted
  - MiniLLM/pile-diff_samp-qwen_1.8B-qwen_104M-r0.5
language:
  - en
metrics:
  - accuracy
pipeline_tag: text-generation

MinPLM-QWen-200M

paper | code

MiniPLM-QWen-200M is a 200M model with QWen achitecture pre-trained from scratch on the Pile using the MiniPLM knowledge distillation framework with the offcial QWen1.5-1.8B as the teacher model.

We also open-source the pre-training corpus refined by Difference Sampling in MiniPLM for reproducibility.

Evaluation

MiniPLM models achieves better performance given the same computation and scales well across model sizes:

Baseline Models

Citation

@misc{gu2024miniplmknowledgedistillationpretraining,
      title={MiniPLM: Knowledge Distillation for Pre-Training Language Models}, 
      author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
      year={2024},
      eprint={2410.17215},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.17215}, 
}