llm-blender
/

PairRM

Text Generation

Inference Endpoints

Model card Files Files and versions Community

yuchenlin commited on Nov 23, 2023

Commit

c845907

•

1 Parent(s): 671d616

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -25,13 +25,22 @@ pipeline_tag: text-generation
 # Pairwise Reward Model for LLMs (PairRM) from LLM-Blender
 - Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
 - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
 - Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
 ## Introduction
 ## Installation
 Since PairRanker contains some custom layers and tokens. We recommend use PairRM with our llm-blender code API.

 # Pairwise Reward Model for LLMs (PairRM) from LLM-Blender
 - Github: [https://github.com/yuchenlin/LLM-Blender](https://github.com/yuchenlin/LLM-Blender)
 - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
 - Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
 ## Introduction
+Pairwise Reward Model (PairRM) takes an instruction and a **pair** of output candidates as the input,
+and output a score for each candidate to measure their **relative** quality.
+Unlike the other RMs that encode and score each candidate respectively,
+PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
+PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
+PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
+Apart from that, one can also use PairRM to
 ## Installation
 Since PairRanker contains some custom layers and tokens. We recommend use PairRM with our llm-blender code API.