Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,15 @@ datasets:
|
|
5 |
base_model:
|
6 |
- Qwen/Qwen2-7B-Instruct
|
7 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
```python
|
9 |
import torch
|
10 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
@@ -42,3 +51,109 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
42 |
# response: {"原因": "回答1中的-1是错误的,因为sigmoid函数的实际输出范围是0到1,而不是包括-1。回答2准确地描述了sigmoid函数的输出范围是0到1。",\n "更好的回答": 2}
|
43 |
|
44 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
base_model:
|
6 |
- Qwen/Qwen2-7B-Instruct
|
7 |
---
|
8 |
+
|
9 |
+
## Introduction
|
10 |
+
|
11 |
+
Con-J-Qwen2-7B (learning the generative \***J***udge using self-generated ***Con***trastive judgments) is an advanced generative judge built on Qwen2-7B-Instruct architecture and dataset Skywork/Skywork-Reward-Preference-80K-v0.1.
|
12 |
+
|
13 |
+
Con-J-Qwen2-7B is trained from preference data. We prompt the pre-trained Qwen2-7B-Instruct model to generate positive and negative judgments, both supported with rationales in natural language form. Then the self-generated contrastive judgment pairs are used to train the generative judge with Direct Preference Optimization (DPO). By doing this, Con-J learns to act as a generative judge and provides accurate and supprting rationales.
|
14 |
+
|
15 |
+
## Usage
|
16 |
+
|
17 |
```python
|
18 |
import torch
|
19 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
51 |
# response: {"原因": "回答1中的-1是错误的,因为sigmoid函数的实际输出范围是0到1,而不是包括-1。回答2准确地描述了sigmoid函数的输出范围是0到1。",\n "更好的回答": 2}
|
52 |
|
53 |
```
|
54 |
+
|
55 |
+
|
56 |
+
## Performance
|
57 |
+
|
58 |
+
<table>
|
59 |
+
<tr>
|
60 |
+
<th rowspan="2">Model</th>
|
61 |
+
<th rowspan="2">Infinity-<br>Preference</th>
|
62 |
+
<th rowspan="2">Ultra-<br>Feedback</th>
|
63 |
+
<th rowspan="2">PKU-<br>SafeRLHF</th>
|
64 |
+
<th colspan="4">Reward-Bench</th>
|
65 |
+
</tr>
|
66 |
+
<tr>
|
67 |
+
<th>Chat</th>
|
68 |
+
<th>Chat-H</th>
|
69 |
+
<th>Safety</th>
|
70 |
+
<th>Reasoning</th>
|
71 |
+
</tr>
|
72 |
+
<tr>
|
73 |
+
<td>Llama3.1-8B</td>
|
74 |
+
<td>59.0</td>
|
75 |
+
<td>62.9</td>
|
76 |
+
<td>66.4</td>
|
77 |
+
<td>80.7</td>
|
78 |
+
<td>49.8</td>
|
79 |
+
<td>64.0</td>
|
80 |
+
<td>68.1</td>
|
81 |
+
</tr>
|
82 |
+
<tr>
|
83 |
+
<td>Llama3.1-70B</td>
|
84 |
+
<td>64.0</td>
|
85 |
+
<td>71.4</td>
|
86 |
+
<td>67.6</td>
|
87 |
+
<td><b>97.2</b></td>
|
88 |
+
<td>70.2</td>
|
89 |
+
<td>82.8</td>
|
90 |
+
<td>86.0</td>
|
91 |
+
</tr>
|
92 |
+
<tr>
|
93 |
+
<td>Qwen2-7B</td>
|
94 |
+
<td>59.0</td>
|
95 |
+
<td>64.5</td>
|
96 |
+
<td>67.2</td>
|
97 |
+
<td>91.3</td>
|
98 |
+
<td>44.8</td>
|
99 |
+
<td>73.6</td>
|
100 |
+
<td>69.0</td>
|
101 |
+
</tr>
|
102 |
+
<tr>
|
103 |
+
<td>Qwen2.5-72B</td>
|
104 |
+
<td>70.0</td>
|
105 |
+
<td>66.0</td>
|
106 |
+
<td>58.7</td>
|
107 |
+
<td>86.6</td>
|
108 |
+
<td>61.4</td>
|
109 |
+
<td>74.5</td>
|
110 |
+
<td><b>90.7</b></td>
|
111 |
+
</tr>
|
112 |
+
<tr>
|
113 |
+
<td>Auto-J</td>
|
114 |
+
<td>69.0</td>
|
115 |
+
<td>63.9</td>
|
116 |
+
<td>66.9</td>
|
117 |
+
<td>93.0</td>
|
118 |
+
<td>40.0</td>
|
119 |
+
<td>65.5</td>
|
120 |
+
<td>50.5</td>
|
121 |
+
</tr>
|
122 |
+
<tr>
|
123 |
+
<td>Prometheus 2</td>
|
124 |
+
<td>68.0</td>
|
125 |
+
<td>63.3</td>
|
126 |
+
<td>63.0</td>
|
127 |
+
<td>85.5</td>
|
128 |
+
<td>49.1</td>
|
129 |
+
<td>77.1</td>
|
130 |
+
<td>76.5</td>
|
131 |
+
</tr>
|
132 |
+
<tr>
|
133 |
+
<td>GPT-4o</td>
|
134 |
+
<td><u>75.0</u></td>
|
135 |
+
<td><u>72.2</u></td>
|
136 |
+
<td><b>69.6</b></td>
|
137 |
+
<td><u>95.3</u></td>
|
138 |
+
<td><u>74.3</u></td>
|
139 |
+
<td><u>87.6</u></td>
|
140 |
+
<td>86.9</td>
|
141 |
+
</tr>
|
142 |
+
<tr>
|
143 |
+
<td>Con-J (ours)</td>
|
144 |
+
<td><b>81.0</b></td>
|
145 |
+
<td><b>73.0</b></td>
|
146 |
+
<td><u>68.4</u></td>
|
147 |
+
<td>91.3</td>
|
148 |
+
<td><b>79.6</b></td>
|
149 |
+
<td><b>88.0</b></td>
|
150 |
+
<td><u>87.1</u></td>
|
151 |
+
</tr>
|
152 |
+
</table>
|
153 |
+
|
154 |
+
|
155 |
+
|
156 |
+
## Reference
|
157 |
+
|
158 |
+
Coming soon.
|
159 |
+
|