Update paper bibtex
Browse files
README.md
CHANGED
@@ -50,8 +50,11 @@ model-index:
|
|
50 |
# **Mistral-ORPO-⍺ (7B)**
|
51 |
|
52 |
**Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *odds ratio preference optimization (ORPO)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-⍺** is fine-tuned exclusively on [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
|
|
|
53 |
|
54 |
-
## Model Performance
|
|
|
|
|
55 |
|
56 |
|Model Name|Size|Align|MT-Bench|AlpacaEval 1.0|AlpacaEval 2.0|
|
57 |
|:--------|:--------------:|:--------------:|:-------------------:|:------------:|:------------:|
|
@@ -62,11 +65,18 @@ model-index:
|
|
62 |
|Llama-2-Chat |7B|RLHF|6.27|71.37|4.96|
|
63 |
|Llama-2-Chat |13B|RLHF|6.65|81.09|7.70|
|
64 |
|
65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
|
67 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6415c043486c7c9a5d151583/
|
68 |
|
69 |
-
## Inference
|
70 |
|
71 |
```python
|
72 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
@@ -92,4 +102,17 @@ response = tokenizer.batch_decode(output)
|
|
92 |
#Hi! How are you doing?</s>
|
93 |
#<|assistant|>
|
94 |
#I'm doing well, thank you! How are you?</s>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
```
|
|
|
50 |
# **Mistral-ORPO-⍺ (7B)**
|
51 |
|
52 |
**Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *odds ratio preference optimization (ORPO)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-⍺** is fine-tuned exclusively on [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
|
53 |
+
- **Github Repository**: https://github.com/xfactlab/orpo
|
54 |
|
55 |
+
## 👍 **Model Performance**
|
56 |
+
|
57 |
+
### 1) AlpacaEval & MT-Bench
|
58 |
|
59 |
|Model Name|Size|Align|MT-Bench|AlpacaEval 1.0|AlpacaEval 2.0|
|
60 |
|:--------|:--------------:|:--------------:|:-------------------:|:------------:|:------------:|
|
|
|
65 |
|Llama-2-Chat |7B|RLHF|6.27|71.37|4.96|
|
66 |
|Llama-2-Chat |13B|RLHF|6.65|81.09|7.70|
|
67 |
|
68 |
+
### 2) IFEval
|
69 |
+
|
70 |
+
| **Model Type** | **Prompt-Strict** | **Prompt-Loose** | **Inst-Strict** | **Inst-Loose** |
|
71 |
+
|--------------------|:-----------------:|:----------------:|:---------------:|:--------------:|
|
72 |
+
| **Mistral-ORPO-⍺** | 0.5009 | 0.5083 | 0.5995 | 0.6163 |
|
73 |
+
| **Mistral-ORPO-β** | 0.5287 | 0.5564 | 0.6355 | 0.6619 |
|
74 |
+
|
75 |
+
## 🗺️ **MT-Bench by Category**
|
76 |
|
77 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6415c043486c7c9a5d151583/1Ifpt0ljCfJPEoZAqlqqy.png)
|
78 |
|
79 |
+
## 🖥️ **Inference**
|
80 |
|
81 |
```python
|
82 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
102 |
#Hi! How are you doing?</s>
|
103 |
#<|assistant|>
|
104 |
#I'm doing well, thank you! How are you?</s>
|
105 |
+
```
|
106 |
+
|
107 |
+
## 📎 **Citation**
|
108 |
+
|
109 |
+
```
|
110 |
+
@misc{hong2024orpo,
|
111 |
+
title={ORPO: Monolithic Preference Optimization without Reference Model},
|
112 |
+
author={Jiwoo Hong and Noah Lee and James Thorne},
|
113 |
+
year={2024},
|
114 |
+
eprint={2403.07691},
|
115 |
+
archivePrefix={arXiv},
|
116 |
+
primaryClass={cs.CL}
|
117 |
+
}
|
118 |
```
|