CoolSpring commited on
Commit
a48b7c0
1 Parent(s): ee31535

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -7
README.md CHANGED
@@ -7,6 +7,11 @@ tags:
7
  model-index:
8
  - name: Qwen2-0.5B-Abyme
9
  results: []
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -74,24 +79,26 @@ xformers_attention: null
74
 
75
  </details><br>
76
 
77
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/coolspring-none/Qwen2-0.5B-Magpie-Qwen2-Pro-300K-Filtered/runs/qcne24ii)
78
  # Qwen2-0.5B-Abyme
79
 
80
- This model is a fine-tuned version of [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B) on the None dataset.
 
81
  It achieves the following results on the evaluation set:
82
- - Loss: 0.8229
83
 
84
  ## Model description
85
 
86
- More information needed
87
 
88
  ## Intended uses & limitations
89
 
90
- More information needed
 
 
91
 
92
  ## Training and evaluation data
93
 
94
- More information needed
95
 
96
  ## Training procedure
97
 
@@ -124,4 +131,4 @@ The following hyperparameters were used during training:
124
  - Transformers 4.42.3
125
  - Pytorch 2.3.1+cu121
126
  - Datasets 2.19.1
127
- - Tokenizers 0.19.1
 
7
  model-index:
8
  - name: Qwen2-0.5B-Abyme
9
  results: []
10
+ datasets:
11
+ - Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
12
+ language:
13
+ - en
14
+ - zh
15
  ---
16
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
79
 
80
  </details><br>
81
 
 
82
  # Qwen2-0.5B-Abyme
83
 
84
+ This model is a fine-tuned version of [Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B) on the [Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered) dataset. It was created to explore the effects of training the smallest model in the Qwen2 series on data extracted from the largest model in the Qwen2 series (as of July 18th, 2024).
85
+
86
  It achieves the following results on the evaluation set:
87
+ - Loss: 0.8229
88
 
89
  ## Model description
90
 
91
+ Qwen2-0.5B-Abyme is a 0.5 billion parameter language model fine-tuned on a dataset of conversation samples from the much larger 72 billion parameter Qwen2-72B model. The purpose of this experiment is to investigate whether a smaller model can effectively learn and reproduce the knowledge and capabilities of a significantly larger model through the fine-tuning process.
92
 
93
  ## Intended uses & limitations
94
 
95
+ This model is intended for research purposes to study the knowledge transfer and distillation capabilities of language models. It may have practical applications in scenarios where the computational resources for running large language models are limited, and a smaller, fine-tuned model can provide comparable performance.
96
+
97
+ However, it is important to note that the model's capabilities and limitations are yet to be fully evaluated. Its performance may vary depending on the task and domain, and it may exhibit biases or limitations inherited from the original models.
98
 
99
  ## Training and evaluation data
100
 
101
+ The model was fine-tuned on the [Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered) dataset, which contains 300,000 conversation samples from the Qwen2-72B model. 5% of this dataset was held out as the evaluation set for calculating the reported loss metric.
102
 
103
  ## Training procedure
104
 
 
131
  - Transformers 4.42.3
132
  - Pytorch 2.3.1+cu121
133
  - Datasets 2.19.1
134
+ - Tokenizers 0.19.1