mightbe
/

Better-PairRM

@@ -3,13 +3,13 @@ license: apache-2.0
 ---
 # Better Implementation for [*PairRM*](https://huggingface.co/llm-blender/PairRM)
-# **Introduction**
 This version of PairRM have some fixes on training process, which improve model's performance significantly.
-## **Minor Fixes**
-### Longer Context Length (2048 -> 3370)
 Thanks to deberta's tokenzer, original PairRM model had enough Context Length.
@@ -17,9 +17,9 @@ But, the longer the better :>
 ---
-## **Major Fixes**
-### Change Prompt Format
 Why use something like
 ```
@@ -30,12 +30,66 @@ So, I changed to a format based on Vicuna 1.1.
 ---
-### Change Truncate side
-The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's seq len.
 ---
-### Dataset Filter
-There was decent amount of empty assistant response on original dataset. So, I dropped them.

 ---
 # Better Implementation for [*PairRM*](https://huggingface.co/llm-blender/PairRM)
+## Introduction
 This version of PairRM have some fixes on training process, which improve model's performance significantly.
+### Minor Fixes
+- Longer Context Length (2048 -> 3370)
 Thanks to deberta's tokenzer, original PairRM model had enough Context Length.
 ---
+### Major Fixes
+- Change Prompt Format
 Why use something like
 ```
 ---
+- Change Truncate side
+The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's context length.
 ---
+- Dataset Filter
+There was decent amount of empty assistant response on original dataset. So, I dropped them.
+---
+## Statistics
+### Context length
+|  PairRanker type  | Source max length | Candidate max length | Total max length |
+|:-----------------:|:-----------------:|----------------------|------------------|
+| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker)             | 128               | 128                  | 384              |
+| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) | 1224              | 412                  | 2048             |
+| [Better-PairRM](https://huggingface.co/maywell/Better-PairRM/) (This model) | 2030              | 670                  | 3370             |
+### Performance
+#### Reward-Bench by AllenAI
+| Metric                     | llm-blender/PairRM-hf  | maywell/Better-PairRM  |
+|----------------------------|------------------------|------------------------|
+| model                      | llm-blender/PairRM-hf  | maywell/Better-PairRM  |
+| model_type                 | Custom Classifier      | Custom Classifier      |
+| alpacaeval-length          | 0.758                  | **0.863**                |
+| alpacaeval-hard            | 0.979                  | **1.000**                  |
+| alpacaeval-easy            | 0.970                  | **0.990**                  |
+| donotanswer                | 0.360                  | **0.522**                  |
+| hep-cpp                    | 0.628                  | **0.646**                  |
+| hep-go                     | 0.689                  | **0.713**                  |
+| hep-java                   | 0.628                  | **0.713**                  |
+| hep-js                     | 0.604                  | **0.707**                  |
+| hep-python                 | 0.646                  | **0.713**                  |
+| hep-rust                   | 0.652                  | **0.726**                  |
+| llmbar-adver-GPTInst       | **0.304**                  | 0.141                  |
+| llmbar-adver-GPTOut        | **0.596**                  | 0.447                  |
+| llmbar-adver-manual        | **0.500**                  | 0.261                  |
+| llmbar-adver-neighbor      | **0.433**                  | 0.276                  |
+| llmbar-natural             | **0.800**                  | 0.720                  |
+| math-prm                   | **0.333**                  | 0.295                  |
+| mt-bench-hard              | 0.649                  | **0.703**                  |
+| mt-bench-med               | 0.900                  | **1.000**                  |
+| mt-bench-easy              | **0.964**                  | 0.929                  |
+| refusals-dangerous         | 0.080                  | **0.730**                  |
+| refusals-offensive         | 0.010                  | **0.940**                  |
+| xstest-should-refuse       | 0.370                  | **0.968**                  |
+| xstest-should-respond      | **0.952**                  | 0.876                  |
+| average                    | 0.600                  | **0.690**                  |
+> *Note - llmbar test score is bit weird across all models on [Reward-Bench](https://huggingface.co/spaces/allenai/reward-bench)*
+## Thanks to
+- [Sionic AI](https://sionic.ai/) for providing the A100 cluster.
+## Contact
+- [Discord Server Link](https://discord.gg/MrBt3PXdXc)