ZiyiYe
/

Con-J-Qwen2-7B

Safetensors

qwen2

Model card Files Files and versions Community

ZiyiYe commited on Oct 2

Commit

87b4a29

•

1 Parent(s): 0eb531e

Update README.md

Browse files

Files changed (1) hide show

README.md +115 -0

README.md CHANGED Viewed

@@ -5,6 +5,15 @@ datasets:
 base_model:
 - Qwen/Qwen2-7B-Instruct
 ---
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -42,3 +51,109 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 # response: {"原因": "回答1中的-1是错误的，因为sigmoid函数的实际输出范围是0到1，而不是包括-1。回答2准确地描述了sigmoid函数的输出范围是0到1。",\n "更好的回答": 2}
 ```

 base_model:
 - Qwen/Qwen2-7B-Instruct
 ---
+## Introduction
+Con-J-Qwen2-7B (learning the generative \***J***udge using self-generated ***Con***trastive judgments) is an advanced generative judge built on Qwen2-7B-Instruct architecture and dataset Skywork/Skywork-Reward-Preference-80K-v0.1.
+Con-J-Qwen2-7B is trained from preference data. We prompt the pre-trained Qwen2-7B-Instruct model to generate positive and negative judgments, both supported with rationales in natural language form. Then the self-generated contrastive judgment pairs are used to train the generative judge with Direct Preference Optimization (DPO). By doing this, Con-J learns to act as a generative judge and provides accurate and supprting rationales.
+## Usage
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # response: {"原因": "回答1中的-1是错误的，因为sigmoid函数的实际输出范围是0到1，而不是包括-1。回答2准确地描述了sigmoid函数的输出范围是0到1。",\n "更好的回答": 2}
 ```
+## Performance
+<table>
+  <tr>
+    <th rowspan="2">Model</th>
+    <th rowspan="2">Infinity-<br>Preference</th>
+    <th rowspan="2">Ultra-<br>Feedback</th>
+    <th rowspan="2">PKU-<br>SafeRLHF</th>
+    <th colspan="4">Reward-Bench</th>
+  </tr>
+  <tr>
+    <th>Chat</th>
+    <th>Chat-H</th>
+    <th>Safety</th>
+    <th>Reasoning</th>
+  </tr>
+  <tr>
+    <td>Llama3.1-8B</td>
+    <td>59.0</td>
+    <td>62.9</td>
+    <td>66.4</td>
+    <td>80.7</td>
+    <td>49.8</td>
+    <td>64.0</td>
+    <td>68.1</td>
+  </tr>
+  <tr>
+    <td>Llama3.1-70B</td>
+    <td>64.0</td>
+    <td>71.4</td>
+    <td>67.6</td>
+    <td><b>97.2</b></td>
+    <td>70.2</td>
+    <td>82.8</td>
+    <td>86.0</td>
+  </tr>
+  <tr>
+    <td>Qwen2-7B</td>
+    <td>59.0</td>
+    <td>64.5</td>
+    <td>67.2</td>
+    <td>91.3</td>
+    <td>44.8</td>
+    <td>73.6</td>
+    <td>69.0</td>
+  </tr>
+  <tr>
+    <td>Qwen2.5-72B</td>
+    <td>70.0</td>
+    <td>66.0</td>
+    <td>58.7</td>
+    <td>86.6</td>
+    <td>61.4</td>
+    <td>74.5</td>
+    <td><b>90.7</b></td>
+  </tr>
+  <tr>
+    <td>Auto-J</td>
+    <td>69.0</td>
+    <td>63.9</td>
+    <td>66.9</td>
+    <td>93.0</td>
+    <td>40.0</td>
+    <td>65.5</td>
+    <td>50.5</td>
+  </tr>
+  <tr>
+    <td>Prometheus 2</td>
+    <td>68.0</td>
+    <td>63.3</td>
+    <td>63.0</td>
+    <td>85.5</td>
+    <td>49.1</td>
+    <td>77.1</td>
+    <td>76.5</td>
+  </tr>
+  <tr>
+    <td>GPT-4o</td>
+    <td><u>75.0</u></td>
+    <td><u>72.2</u></td>
+    <td><b>69.6</b></td>
+    <td><u>95.3</u></td>
+    <td><u>74.3</u></td>
+    <td><u>87.6</u></td>
+    <td>86.9</td>
+  </tr>
+  <tr>
+    <td>Con-J (ours)</td>
+    <td><b>81.0</b></td>
+    <td><b>73.0</b></td>
+    <td><u>68.4</u></td>
+    <td>91.3</td>
+    <td><b>79.6</b></td>
+    <td><b>88.0</b></td>
+    <td><u>87.1</u></td>
+  </tr>
+</table>
+## Reference
+Coming soon.