Update README.md
Browse files
README.md
CHANGED
@@ -34,13 +34,14 @@ pipeline_tag: text-generation
|
|
34 |
|
35 |
Pairwise Reward Model (PairRM) takes an instruction and a **pair** of output candidates as the input,
|
36 |
and output a score for each candidate to measure their **relative** quality.
|
37 |
-
Unlike the other RMs that encode and score each candidate respectively,
|
38 |
-
PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
|
39 |
-
|
40 |
PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
|
41 |
PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
|
42 |
Apart from that, one can also use PairRM to further align instruction-tuned LLMs with RLHF methods.
|
43 |
|
|
|
|
|
|
|
|
|
44 |
PairRM is part of the LLM-Blender project (ACL 2023). Please see our paper linked above to know more.
|
45 |
|
46 |
|
|
|
34 |
|
35 |
Pairwise Reward Model (PairRM) takes an instruction and a **pair** of output candidates as the input,
|
36 |
and output a score for each candidate to measure their **relative** quality.
|
|
|
|
|
|
|
37 |
PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
|
38 |
PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
|
39 |
Apart from that, one can also use PairRM to further align instruction-tuned LLMs with RLHF methods.
|
40 |
|
41 |
+
Unlike the other RMs that encode and score each candidate respectively,
|
42 |
+
PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
|
43 |
+
Also, PairRM is based on DeBERTa-large, and thus it is super efficient: 0.4B.
|
44 |
+
We trained PairRM on a diverse collection of human preference datasets such as UltraFeedback, HH-RLHF, chatbot-arena, etc.
|
45 |
PairRM is part of the LLM-Blender project (ACL 2023). Please see our paper linked above to know more.
|
46 |
|
47 |
|