Update README.md
Browse files
README.md
CHANGED
@@ -25,13 +25,22 @@ pipeline_tag: text-generation
|
|
25 |
# Pairwise Reward Model for LLMs (PairRM) from LLM-Blender
|
26 |
|
27 |
|
28 |
-
|
29 |
- Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
|
30 |
- Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
|
31 |
- Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
|
32 |
|
|
|
33 |
## Introduction
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
## Installation
|
37 |
Since PairRanker contains some custom layers and tokens. We recommend use PairRM with our llm-blender code API.
|
|
|
25 |
# Pairwise Reward Model for LLMs (PairRM) from LLM-Blender
|
26 |
|
27 |
|
|
|
28 |
- Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
|
29 |
- Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
|
30 |
- Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
|
31 |
|
32 |
+
|
33 |
## Introduction
|
34 |
|
35 |
+
Pairwise Reward Model (PairRM) takes an instruction and a **pair** of output candidates as the input,
|
36 |
+
and output a score for each candidate to measure their **relative** quality.
|
37 |
+
Unlike the other RMs that encode and score each candidate respectively,
|
38 |
+
PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
|
39 |
+
|
40 |
+
PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
|
41 |
+
PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
|
42 |
+
Apart from that, one can also use PairRM to
|
43 |
+
|
44 |
|
45 |
## Installation
|
46 |
Since PairRanker contains some custom layers and tokens. We recommend use PairRM with our llm-blender code API.
|