Text Generation
Transformers
Safetensors
English
deberta
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
yuchenlin commited on
Commit
c845907
1 Parent(s): 671d616

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -25,13 +25,22 @@ pipeline_tag: text-generation
25
  # Pairwise Reward Model for LLMs (PairRM) from LLM-Blender
26
 
27
 
28
-
29
  - Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
30
  - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
31
  - Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
32
 
 
33
  ## Introduction
34
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Installation
37
  Since PairRanker contains some custom layers and tokens. We recommend use PairRM with our llm-blender code API.
 
25
  # Pairwise Reward Model for LLMs (PairRM) from LLM-Blender
26
 
27
 
 
28
  - Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
29
  - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
30
  - Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
31
 
32
+
33
  ## Introduction
34
 
35
+ Pairwise Reward Model (PairRM) takes an instruction and a **pair** of output candidates as the input,
36
+ and output a score for each candidate to measure their **relative** quality.
37
+ Unlike the other RMs that encode and score each candidate respectively,
38
+ PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
39
+
40
+ PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
41
+ PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
42
+ Apart from that, one can also use PairRM to
43
+
44
 
45
  ## Installation
46
  Since PairRanker contains some custom layers and tokens. We recommend use PairRM with our llm-blender code API.