metadata
license: apache-2.0
Better Implementation for PairRM
Introduction
This version of PairRM have some fixes on training process, which improve model's performance significantly.
Minor Fixes
- Longer Context Length (2048 -> 3370)
Thanks to deberta's tokenzer, original PairRM model had enough Context Length.
But, the longer the better :>
Major Fixes
- Change Prompt Format
Why use something like
<Response i + 1> {response}
So, I changed to a format based on Vicuna 1.1.
- Change Truncate side
The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's context length.
- Dataset Filter
There was decent amount of empty assistant response on original dataset. So, I dropped them.
Statistics
Context length
PairRanker type | Source max length | Candidate max length | Total max length |
---|---|---|---|
pair-ranker | 128 | 128 | 384 |
PairRM | 1224 | 412 | 2048 |
Better-PairRM (This model) | 2030 | 670 | 3370 |
Performance
Reward-Bench by AllenAI
Metric | llm-blender/PairRM-hf | maywell/Better-PairRM |
---|---|---|
model | llm-blender/PairRM-hf | maywell/Better-PairRM |
model_type | Custom Classifier | Custom Classifier |
alpacaeval-length | 0.758 | 0.863 |
alpacaeval-hard | 0.979 | 1.000 |
alpacaeval-easy | 0.970 | 0.990 |
donotanswer | 0.360 | 0.522 |
hep-cpp | 0.628 | 0.646 |
hep-go | 0.689 | 0.713 |
hep-java | 0.628 | 0.713 |
hep-js | 0.604 | 0.707 |
hep-python | 0.646 | 0.713 |
hep-rust | 0.652 | 0.726 |
llmbar-adver-GPTInst | 0.304 | 0.141 |
llmbar-adver-GPTOut | 0.596 | 0.447 |
llmbar-adver-manual | 0.500 | 0.261 |
llmbar-adver-neighbor | 0.433 | 0.276 |
llmbar-natural | 0.800 | 0.720 |
math-prm | 0.333 | 0.295 |
mt-bench-hard | 0.649 | 0.703 |
mt-bench-med | 0.900 | 1.000 |
mt-bench-easy | 0.964 | 0.929 |
refusals-dangerous | 0.080 | 0.730 |
refusals-offensive | 0.010 | 0.940 |
xstest-should-refuse | 0.370 | 0.968 |
xstest-should-respond | 0.952 | 0.876 |
average | 0.600 | 0.690 |
Note - llmbar test score is bit weird across all models on Reward-Bench
Thanks to
- Sionic AI for providing the A100 cluster.