Safetensors
qwen2
ZiyiYe commited on
Commit
87b4a29
1 Parent(s): 0eb531e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md CHANGED
@@ -5,6 +5,15 @@ datasets:
5
  base_model:
6
  - Qwen/Qwen2-7B-Instruct
7
  ---
 
 
 
 
 
 
 
 
 
8
  ```python
9
  import torch
10
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -42,3 +51,109 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
42
  # response: {"原因": "回答1中的-1是错误的,因为sigmoid函数的实际输出范围是0到1,而不是包括-1。回答2准确地描述了sigmoid函数的输出范围是0到1。",\n "更好的回答": 2}
43
 
44
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  base_model:
6
  - Qwen/Qwen2-7B-Instruct
7
  ---
8
+
9
+ ## Introduction
10
+
11
+ Con-J-Qwen2-7B (learning the generative \***J***udge using self-generated ***Con***trastive judgments) is an advanced generative judge built on Qwen2-7B-Instruct architecture and dataset Skywork/Skywork-Reward-Preference-80K-v0.1.
12
+
13
+ Con-J-Qwen2-7B is trained from preference data. We prompt the pre-trained Qwen2-7B-Instruct model to generate positive and negative judgments, both supported with rationales in natural language form. Then the self-generated contrastive judgment pairs are used to train the generative judge with Direct Preference Optimization (DPO). By doing this, Con-J learns to act as a generative judge and provides accurate and supprting rationales.
14
+
15
+ ## Usage
16
+
17
  ```python
18
  import torch
19
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
51
  # response: {"原因": "回答1中的-1是错误的,因为sigmoid函数的实际输出范围是0到1,而不是包括-1。回答2准确地描述了sigmoid函数的输出范围是0到1。",\n "更好的回答": 2}
52
 
53
  ```
54
+
55
+
56
+ ## Performance
57
+
58
+ <table>
59
+ <tr>
60
+ <th rowspan="2">Model</th>
61
+ <th rowspan="2">Infinity-<br>Preference</th>
62
+ <th rowspan="2">Ultra-<br>Feedback</th>
63
+ <th rowspan="2">PKU-<br>SafeRLHF</th>
64
+ <th colspan="4">Reward-Bench</th>
65
+ </tr>
66
+ <tr>
67
+ <th>Chat</th>
68
+ <th>Chat-H</th>
69
+ <th>Safety</th>
70
+ <th>Reasoning</th>
71
+ </tr>
72
+ <tr>
73
+ <td>Llama3.1-8B</td>
74
+ <td>59.0</td>
75
+ <td>62.9</td>
76
+ <td>66.4</td>
77
+ <td>80.7</td>
78
+ <td>49.8</td>
79
+ <td>64.0</td>
80
+ <td>68.1</td>
81
+ </tr>
82
+ <tr>
83
+ <td>Llama3.1-70B</td>
84
+ <td>64.0</td>
85
+ <td>71.4</td>
86
+ <td>67.6</td>
87
+ <td><b>97.2</b></td>
88
+ <td>70.2</td>
89
+ <td>82.8</td>
90
+ <td>86.0</td>
91
+ </tr>
92
+ <tr>
93
+ <td>Qwen2-7B</td>
94
+ <td>59.0</td>
95
+ <td>64.5</td>
96
+ <td>67.2</td>
97
+ <td>91.3</td>
98
+ <td>44.8</td>
99
+ <td>73.6</td>
100
+ <td>69.0</td>
101
+ </tr>
102
+ <tr>
103
+ <td>Qwen2.5-72B</td>
104
+ <td>70.0</td>
105
+ <td>66.0</td>
106
+ <td>58.7</td>
107
+ <td>86.6</td>
108
+ <td>61.4</td>
109
+ <td>74.5</td>
110
+ <td><b>90.7</b></td>
111
+ </tr>
112
+ <tr>
113
+ <td>Auto-J</td>
114
+ <td>69.0</td>
115
+ <td>63.9</td>
116
+ <td>66.9</td>
117
+ <td>93.0</td>
118
+ <td>40.0</td>
119
+ <td>65.5</td>
120
+ <td>50.5</td>
121
+ </tr>
122
+ <tr>
123
+ <td>Prometheus 2</td>
124
+ <td>68.0</td>
125
+ <td>63.3</td>
126
+ <td>63.0</td>
127
+ <td>85.5</td>
128
+ <td>49.1</td>
129
+ <td>77.1</td>
130
+ <td>76.5</td>
131
+ </tr>
132
+ <tr>
133
+ <td>GPT-4o</td>
134
+ <td><u>75.0</u></td>
135
+ <td><u>72.2</u></td>
136
+ <td><b>69.6</b></td>
137
+ <td><u>95.3</u></td>
138
+ <td><u>74.3</u></td>
139
+ <td><u>87.6</u></td>
140
+ <td>86.9</td>
141
+ </tr>
142
+ <tr>
143
+ <td>Con-J (ours)</td>
144
+ <td><b>81.0</b></td>
145
+ <td><b>73.0</b></td>
146
+ <td><u>68.4</u></td>
147
+ <td>91.3</td>
148
+ <td><b>79.6</b></td>
149
+ <td><b>88.0</b></td>
150
+ <td><u>87.1</u></td>
151
+ </tr>
152
+ </table>
153
+
154
+
155
+
156
+ ## Reference
157
+
158
+ Coming soon.
159
+