Update README.md
Browse files
README.md
CHANGED
@@ -41,13 +41,7 @@ Apart from that, one can also use PairRM to further align instruction-tuned LLMs
|
|
41 |
Unlike the other RMs that encode and score each candidate respectively,
|
42 |
PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
|
43 |
Also, PairRM is based on [`microsoft/deberta-v3-large`](https://huggingface.co/microsoft/deberta-v3-large), and thus it is super efficient: 0.4B.
|
44 |
-
We trained PairRM on a diverse collection of six human-preference datasets
|
45 |
-
- [`UltraFeedback`](https://huggingface.co/datasets/openbmb/UltraFeedback)
|
46 |
-
- [`HH-RLHF`](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
47 |
-
- [`summarize_from_feedback`](https://huggingface.co/datasets/openai/summarize_from_feedback)
|
48 |
-
- [`chatbot_arena_conversations`](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
|
49 |
-
- [`webgpt_comparisons`](https://huggingface.co/datasets/openai/webgpt_comparisons)
|
50 |
-
- [`instruct-synthetic-prompt-responses`](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses).
|
51 |
|
52 |
PairRM is part of the LLM-Blender project (ACL 2023). Please see our [paper](https://arxiv.org/abs/2306.02561) above to know more.
|
53 |
|
|
|
41 |
Unlike the other RMs that encode and score each candidate respectively,
|
42 |
PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
|
43 |
Also, PairRM is based on [`microsoft/deberta-v3-large`](https://huggingface.co/microsoft/deberta-v3-large), and thus it is super efficient: 0.4B.
|
44 |
+
We trained PairRM on a diverse collection of six human-preference datasets (see more [here](https://huggingface.co/llm-blender/PairRM#training-datasets)).
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
PairRM is part of the LLM-Blender project (ACL 2023). Please see our [paper](https://arxiv.org/abs/2306.02561) above to know more.
|
47 |
|