JW17 commited on
Commit
c9c3f0c
1 Parent(s): b2c71a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -1,3 +1,62 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ base_model:
6
+ - mistralai/Mistral-7B-v0.1
7
+ datasets:
8
+ - HuggingFaceH4/ultrafeedback_binarized
9
+ pipeline_tag: text-generation
10
+ model-index:
11
+ - name: Mistral-ORPO-⍺
12
+ results:
13
+ - task:
14
+ type: text-generation
15
+ dataset:
16
+ name: AlpacaEval 1
17
+ type: AlpacaEval
18
+ metrics:
19
+ - type: AlpacaEval 1.0
20
+ value: 87.92%
21
+ name: Win Rate
22
+ - type: AlpacaEval 2.0
23
+ value: 11.33%
24
+ name: Win Rate
25
+ source:
26
+ url: https://github.com/tatsu-lab/alpaca_eval
27
+ name: self-reported
28
+ - task:
29
+ type: text-generation
30
+ dataset:
31
+ name: MT-Bench
32
+ type: MT-Bench
33
+ metrics:
34
+ - type: MT-Bench
35
+ value: 7.23
36
+ name: Score
37
+ source:
38
+ url: https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/
39
+ name: self-reported
40
  ---
41
+ # **Mistral-ORPO-⍺ (7B)**
42
+
43
+ **Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *odds ratio preference optimization (ORPO)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-⍺** is fine-tuned exclusively on [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
44
+
45
+ ## Model Performance
46
+
47
+ |Model Name|Size|Align|MT-Bench|AlpacaEval 1.0|AlpacaEval 2.0|
48
+ |:--------|:--------------:|:--------------:|:-------------------:|:------------:|:------------:|
49
+ |**Mistral-<tt>ORPO</tt>-⍺**|7B|<tt>ORPO</tt>|7.23|87.92|11.33|
50
+ |**Mistral-<tt>ORPO</tt>-β**|7B|<tt>ORPO</tt>|7.32|91.41|12.20|
51
+ |Zephyr β |7B|DPO|7.34|90.60|10.99|
52
+ |TULU-2-DPO |13B|DPO|7.00|89.5|10.12|
53
+ |Llama-2-Chat |7B|RLHF|6.27|71.37|4.96|
54
+ |Llama-2-Chat |13B|RLHF|6.65|81.09|7.70|
55
+
56
+
57
+ ## Chat Template
58
+ ```
59
+ <|user|>
60
+ Hi! How are you doing?</s>
61
+ <|assistant|>
62
+ ```