Xenon1
/

Zenith-7B-dpo-v3

Text Generation

Zenith-7B-dpo-v3

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gagan3012 commited on Feb 15, 2024

Commit

51590ef

·

verified ·

1 Parent(s): df10314

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -11,16 +11,17 @@ pipeline_tag: text-generation
 Mistral-7B-v0.1 model fine-tuned on the Ultrafeedback dataset using techinques shown in the paper [Self-Rewarding Language Models](https://arxiv.org/abs/2401.10020).
-## Instruction format
-In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
 ## Results
 | model_name       |   Average |   arc_challenge |   hellaswag |   truthfulqa_mc2 |   winogrande |
 |:-----------------|----------:|----------------:|------------:|-----------------:|-------------:|
 | Zenith-7B-dpo-v3 |  0.707576 |        0.613481 |    0.848337 |         0.602897 |     0.765588 |
 E.g.
 ```
 text = "<s>[INST] What is your favourite condiment? [/INST]"

 Mistral-7B-v0.1 model fine-tuned on the Ultrafeedback dataset using techinques shown in the paper [Self-Rewarding Language Models](https://arxiv.org/abs/2401.10020).
 ## Results
 | model_name       |   Average |   arc_challenge |   hellaswag |   truthfulqa_mc2 |   winogrande |
 |:-----------------|----------:|----------------:|------------:|-----------------:|-------------:|
 | Zenith-7B-dpo-v3 |  0.707576 |        0.613481 |    0.848337 |         0.602897 |     0.765588 |
+## Instruction format
+In order to leverage instruction fine-tuning, your prompt should be surrounded by `[INST]` and `[/INST]` tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
 E.g.
 ```
 text = "<s>[INST] What is your favourite condiment? [/INST]"