Transformers
Safetensors
English
deberta-v2
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
maywell commited on
Commit
1b901ae
1 Parent(s): 7fc0cce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md CHANGED
@@ -1,3 +1,41 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Better Implementation for [*PairRM*](https://huggingface.co/llm-blender/PairRM)
5
+
6
+ # Introduction
7
+
8
+ This version of PairRM have some fixes on training process, which improve model's performance significantly.
9
+
10
+ ## **Minor Fixes**
11
+
12
+ ### Longer Context Length (2048 -> 3380)
13
+
14
+ Thanks to deberta's tokenzer, original PairRM model had enough Context Length.
15
+
16
+ But, the longer the better :>
17
+
18
+ ---
19
+
20
+ ## **Major Fixes**
21
+
22
+ ### Change Prompt Format
23
+
24
+ Why use something like
25
+ ```
26
+ <Response i + 1> {response}
27
+ ```
28
+
29
+ So, I changed to a format based on Vicuna 1.1.
30
+
31
+ ---
32
+
33
+ ### Change Truncate side
34
+
35
+ The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's seq len.
36
+
37
+ ---
38
+
39
+ ### Dataset Filter
40
+
41
+ There was decent amount of empty assistant response on original dataset. So, I dropped them.