CoolSpring
/

Qwen2-0.5B-Abyme

@@ -7,6 +7,11 @@ tags:
 model-index:
 - name: Qwen2-0.5B-Abyme
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -74,24 +79,26 @@ xformers_attention: null
 </details><br>
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/coolspring-none/Qwen2-0.5B-Magpie-Qwen2-Pro-300K-Filtered/runs/qcne24ii)
 # Qwen2-0.5B-Abyme
-This model is a fine-tuned version of [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.8229
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -124,4 +131,4 @@ The following hyperparameters were used during training:
 - Transformers 4.42.3
 - Pytorch 2.3.1+cu121
 - Datasets 2.19.1
-- Tokenizers 0.19.1

 model-index:
 - name: Qwen2-0.5B-Abyme
   results: []
+datasets:
+- Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
+language:
+- en
+- zh
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 </details><br>
 # Qwen2-0.5B-Abyme
+This model is a fine-tuned version of [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B) on the [Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered) dataset. It was created to explore the effects of training the smallest model in the Qwen2 series on data extracted from the largest model in the Qwen2 series (as of July 18th, 2024).
 It achieves the following results on the evaluation set:
+- Loss: 0.8229
 ## Model description
+Qwen2-0.5B-Abyme is a 0.5 billion parameter language model fine-tuned on a dataset of conversation samples from the much larger 72 billion parameter Qwen2-72B model. The purpose of this experiment is to investigate whether a smaller model can effectively learn and reproduce the knowledge and capabilities of a significantly larger model through the fine-tuning process.
 ## Intended uses & limitations
+This model is intended for research purposes to study the knowledge transfer and distillation capabilities of language models. It may have practical applications in scenarios where the computational resources for running large language models are limited, and a smaller, fine-tuned model can provide comparable performance.
+However, it is important to note that the model's capabilities and limitations are yet to be fully evaluated. Its performance may vary depending on the task and domain, and it may exhibit biases or limitations inherited from the original models.
 ## Training and evaluation data
+The model was fine-tuned on the [Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered) dataset, which contains 300,000 conversation samples from the Qwen2-72B model. 5% of this dataset was held out as the evaluation set for calculating the reported loss metric.
 ## Training procedure
 - Transformers 4.42.3
 - Pytorch 2.3.1+cu121
 - Datasets 2.19.1
+- Tokenizers 0.19.1