--- language: - en - zh license: apache-2.0 tags: - axolotl - generated_from_trainer base_model: Qwen/Qwen2-0.5B datasets: - Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered model-index: - name: Qwen2-0.5B-Abyme results: [] --- [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml adapter: null base_model: Qwen/Qwen2-0.5B bf16: auto chat_template: chatml dataset_prepared_path: ./data/last_run_prepared datasets: - path: Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered type: sharegpt deepspeed: null early_stopping_patience: null eval_sample_packing: true evals_per_epoch: 4 flash_attention: true fp16: null fsdp: null gradient_accumulation_steps: 4 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false group_by_length: false hf_use_auth_token: true hub_model_id: CoolSpring/Qwen2-0.5B-Abyme learning_rate: 2e-5 load_in_4bit: false load_in_8bit: false local_rank: null logging_steps: 1 lr_scheduler: cosine micro_batch_size: 4 num_epochs: 1 optimizer: adamw_torch output_dir: ./outputs/out pad_to_sequence_len: true resize_token_embeddings_to_32x: true resume_from_checkpoint: null sample_packing: true saves_per_epoch: 1 sequence_len: 4096 tf32: true tokens: - <|im_start|> - <|im_end|> train_on_inputs: false val_set_size: 0.05 wandb_entity: null wandb_log_model: null wandb_name: Qwen2-0.5B-Abyme wandb_project: Qwen2-0.5B-Magpie-Qwen2-Pro-300K-Filtered wandb_watch: null warmup_steps: 100 weight_decay: null xformers_attention: null ```

# Qwen2-0.5B-Abyme This model is a fine-tuned version of [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B) on the [Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered) dataset. It was created to explore the effects of training the smallest model in the Qwen2 series on data extracted from the largest model in the Qwen2 series (as of July 18th, 2024). It achieves the following results on the evaluation set: - Loss: 0.8229 ## Model description Qwen2-0.5B-Abyme is a 0.5 billion parameter language model fine-tuned on a dataset of conversation samples from the much larger 72 billion parameter Qwen2-72B model. The purpose of this experiment is to investigate whether a smaller model can effectively learn and reproduce the knowledge and capabilities of a significantly larger model through the fine-tuning process. ## Intended uses & limitations This model is intended for research purposes to study the knowledge transfer and distillation capabilities of language models. It may have practical applications in scenarios where the computational resources for running large language models are limited, and a smaller, fine-tuned model can provide comparable performance. However, it is important to note that the model's capabilities and limitations are yet to be fully evaluated. Its performance may vary depending on the task and domain, and it may exhibit biases or limitations inherited from the original models. ## Training and evaluation data The model was fine-tuned on the [Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered) dataset, which contains 300,000 conversation samples from the Qwen2-72B model. 5% of this dataset was held out as the evaluation set for calculating the reported loss metric. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.9947 | 0.0004 | 1 | 0.9683 | | 0.8385 | 0.2501 | 597 | 0.8338 | | 0.7636 | 0.5002 | 1194 | 0.8249 | | 0.8124 | 0.7502 | 1791 | 0.8229 | ### Framework versions - Transformers 4.42.3 - Pytorch 2.3.1+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CoolSpring__Qwen2-0.5B-Abyme) | Metric |Value| |-------------------|----:| |Avg. | 4.76| |IFEval (0-Shot) |19.15| |BBH (3-Shot) | 2.28| |MATH Lvl 5 (4-Shot)| 1.51| |GPQA (0-shot) | 0.45| |MuSR (0-shot) | 1.48| |MMLU-PRO (5-shot) | 3.70|