Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -11,16 +11,17 @@ pipeline_tag: text-generation
|
|
11 |
|
12 |
Mistral-7B-v0.1 model fine-tuned on the Ultrafeedback dataset using techinques shown in the paper [Self-Rewarding Language Models](https://arxiv.org/abs/2401.10020).
|
13 |
|
14 |
-
## Instruction format
|
15 |
-
|
16 |
-
In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
|
17 |
-
|
18 |
## Results
|
19 |
|
20 |
| model_name | Average | arc_challenge | hellaswag | truthfulqa_mc2 | winogrande |
|
21 |
|:-----------------|----------:|----------------:|------------:|-----------------:|-------------:|
|
22 |
| Zenith-7B-dpo-v3 | 0.707576 | 0.613481 | 0.848337 | 0.602897 | 0.765588 |
|
23 |
|
|
|
|
|
|
|
|
|
|
|
24 |
E.g.
|
25 |
```
|
26 |
text = "<s>[INST] What is your favourite condiment? [/INST]"
|
|
|
11 |
|
12 |
Mistral-7B-v0.1 model fine-tuned on the Ultrafeedback dataset using techinques shown in the paper [Self-Rewarding Language Models](https://arxiv.org/abs/2401.10020).
|
13 |
|
|
|
|
|
|
|
|
|
14 |
## Results
|
15 |
|
16 |
| model_name | Average | arc_challenge | hellaswag | truthfulqa_mc2 | winogrande |
|
17 |
|:-----------------|----------:|----------------:|------------:|-----------------:|-------------:|
|
18 |
| Zenith-7B-dpo-v3 | 0.707576 | 0.613481 | 0.848337 | 0.602897 | 0.765588 |
|
19 |
|
20 |
+
## Instruction format
|
21 |
+
|
22 |
+
In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
|
23 |
+
|
24 |
+
|
25 |
E.g.
|
26 |
```
|
27 |
text = "<s>[INST] What is your favourite condiment? [/INST]"
|