Update README.md
Browse files
README.md
CHANGED
@@ -16,8 +16,7 @@ pipeline_tag: text-generation
|
|
16 |
model-index:
|
17 |
- name: zephyr-7b-beta
|
18 |
results:
|
19 |
-
|
20 |
-
# AI2 Reasoning Challenge (25-Shot) (Open LLM Leaderboard)
|
21 |
- task:
|
22 |
type: text-generation
|
23 |
name: Text Generation
|
@@ -43,7 +42,7 @@ model-index:
|
|
43 |
name: Open LLM Leaderboard
|
44 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
45 |
|
46 |
-
# HellaSwag (10-shot)
|
47 |
- task:
|
48 |
type: text-generation
|
49 |
name: Text Generation
|
@@ -68,7 +67,7 @@ model-index:
|
|
68 |
name: Open LLM Leaderboard
|
69 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
70 |
|
71 |
-
# DROP (3-shot)
|
72 |
- task:
|
73 |
type: text-generation
|
74 |
name: Text Generation
|
@@ -93,7 +92,7 @@ model-index:
|
|
93 |
name: Open LLM Leaderboard
|
94 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
95 |
|
96 |
-
# TruthfulQA (0-shot)
|
97 |
- task:
|
98 |
type: text-generation
|
99 |
name: Text Generation
|
@@ -117,7 +116,7 @@ model-index:
|
|
117 |
name: Open LLM Leaderboard
|
118 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
119 |
|
120 |
-
# GSM8k (5-shot)
|
121 |
- task:
|
122 |
type: text-generation
|
123 |
name: Text Generation
|
@@ -137,7 +136,7 @@ model-index:
|
|
137 |
name: Open LLM Leaderboard
|
138 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
139 |
|
140 |
-
# MMLU (5-Shot)
|
141 |
# ???
|
142 |
|
143 |
# AlpacaEval (taken from model card)
|
@@ -296,7 +295,9 @@ The following hyperparameters were used during training:
|
|
296 |
- lr_scheduler_type: linear
|
297 |
- lr_scheduler_warmup_ratio: 0.1
|
298 |
- num_epochs: 3.0
|
|
|
299 |
### Training results
|
|
|
300 |
The table below shows the full set of DPO training metrics:
|
301 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
302 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
@@ -358,12 +359,16 @@ The table below shows the full set of DPO training metrics:
|
|
358 |
| 0.0077 | 2.89 | 5600 | 0.7520 | -4.5586 | -8.3485 | 0.7969 | 3.7899 | -340.4545 | -299.8206 | -2.3078 | -2.3517 |
|
359 |
| 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
|
360 |
| 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
|
|
|
361 |
### Framework versions
|
|
|
362 |
- Transformers 4.35.0.dev0
|
363 |
- Pytorch 2.0.1+cu118
|
364 |
- Datasets 2.12.0
|
365 |
- Tokenizers 0.14.0
|
|
|
366 |
## Citation
|
|
|
367 |
If you find Zephyr-7B-β is useful in your work, please cite it with:
|
368 |
```
|
369 |
@misc{tunstall2023zephyr,
|
|
|
16 |
model-index:
|
17 |
- name: zephyr-7b-beta
|
18 |
results:
|
19 |
+
# AI2 Reasoning Challenge (25-Shot)
|
|
|
20 |
- task:
|
21 |
type: text-generation
|
22 |
name: Text Generation
|
|
|
42 |
name: Open LLM Leaderboard
|
43 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
44 |
|
45 |
+
# HellaSwag (10-shot)
|
46 |
- task:
|
47 |
type: text-generation
|
48 |
name: Text Generation
|
|
|
67 |
name: Open LLM Leaderboard
|
68 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
69 |
|
70 |
+
# DROP (3-shot)
|
71 |
- task:
|
72 |
type: text-generation
|
73 |
name: Text Generation
|
|
|
92 |
name: Open LLM Leaderboard
|
93 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
94 |
|
95 |
+
# TruthfulQA (0-shot)
|
96 |
- task:
|
97 |
type: text-generation
|
98 |
name: Text Generation
|
|
|
116 |
name: Open LLM Leaderboard
|
117 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
118 |
|
119 |
+
# GSM8k (5-shot)
|
120 |
- task:
|
121 |
type: text-generation
|
122 |
name: Text Generation
|
|
|
136 |
name: Open LLM Leaderboard
|
137 |
url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
|
138 |
|
139 |
+
# MMLU (5-Shot)
|
140 |
# ???
|
141 |
|
142 |
# AlpacaEval (taken from model card)
|
|
|
295 |
- lr_scheduler_type: linear
|
296 |
- lr_scheduler_warmup_ratio: 0.1
|
297 |
- num_epochs: 3.0
|
298 |
+
|
299 |
### Training results
|
300 |
+
|
301 |
The table below shows the full set of DPO training metrics:
|
302 |
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
303 |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
|
|
359 |
| 0.0077 | 2.89 | 5600 | 0.7520 | -4.5586 | -8.3485 | 0.7969 | 3.7899 | -340.4545 | -299.8206 | -2.3078 | -2.3517 |
|
360 |
| 0.0094 | 2.94 | 5700 | 0.7527 | -4.5542 | -8.3509 | 0.7812 | 3.7967 | -340.4790 | -299.7773 | -2.3062 | -2.3510 |
|
361 |
| 0.0054 | 2.99 | 5800 | 0.7520 | -4.5169 | -8.3079 | 0.7812 | 3.7911 | -340.0493 | -299.4038 | -2.3081 | -2.3530 |
|
362 |
+
|
363 |
### Framework versions
|
364 |
+
|
365 |
- Transformers 4.35.0.dev0
|
366 |
- Pytorch 2.0.1+cu118
|
367 |
- Datasets 2.12.0
|
368 |
- Tokenizers 0.14.0
|
369 |
+
|
370 |
## Citation
|
371 |
+
|
372 |
If you find Zephyr-7B-β is useful in your work, please cite it with:
|
373 |
```
|
374 |
@misc{tunstall2023zephyr,
|