Spaces:
Running
Running
natolambert
commited on
Commit
•
65e180d
1
Parent(s):
b7aaef4
details
Browse files
src/md.py
CHANGED
@@ -2,13 +2,23 @@ ABOUT_TEXT = """
|
|
2 |
We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
|
3 |
A win is when the score for the chosen response is higher than the score for the rejected response.
|
4 |
|
|
|
|
|
5 |
We average over 4 core sections (per prompt weighting):
|
6 |
-
1. Chat
|
7 |
-
2. Chat Hard
|
8 |
-
3. Safety
|
9 |
-
4. Code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
|
12 |
|
13 |
Total number of the prompts is: 2538, filtered from 4676.
|
14 |
|
|
|
2 |
We compute the win percentage for a reward model on hand curated chosen-rejected pairs for each prompt.
|
3 |
A win is when the score for the chosen response is higher than the score for the rejected response.
|
4 |
|
5 |
+
## Overview
|
6 |
+
|
7 |
We average over 4 core sections (per prompt weighting):
|
8 |
+
1. **Chat**: Includes the easy chat subsets (alpacaeval-easy, alpacaeval-length, alpacaeval-hard, mt-bench-easy, mt-bench-medium)
|
9 |
+
2. **Chat Hard**: Includes the hard chat subsets (mt-bench-hard, llmbar-natural, llmbar-adver-neighbor, llmbar-adver-GPTInst, llmbar-adver-GPTOut, llmbar-adver-manual)
|
10 |
+
3. **Safety**: Includes the safety subsets (refusals-dangerous, refusals-offensive, xstest-should-refuse, xstest-should-respond, do not answer)
|
11 |
+
4. **Code**: Includes the code subsets (hep-cpp, hep-go, hep-java, hep-js, hep-python, hep-rust)
|
12 |
+
|
13 |
+
We include multiple types of reward models in this evaluation:
|
14 |
+
1. **Sequence Classifiers** (Seq. Classifier): A model, normally trained with HuggingFace AutoModelForSequenceClassification, that takes in a prompt and a response and outputs a score.
|
15 |
+
2. **Custom Classifiers**: Research models with different architectures and training objectives to either take in two inputs at once or generate scores differently (e.g. PairRM and Stanford SteamSHP).
|
16 |
+
3. **DPO**: Models trained with Direct Preference Optimization (DPO), with modifiers such as `-ref-free` or `-norm` changing how scores are computed.
|
17 |
+
4. **Random**: Random choice baseline.
|
18 |
+
|
19 |
+
Others, such as **Generative Judge** are coming soon.
|
20 |
|
21 |
+
### Subset Details
|
22 |
|
23 |
Total number of the prompts is: 2538, filtered from 4676.
|
24 |
|