llm-blender
/

PairRM

Text Generation

Inference Endpoints

Model card Files Files and versions Community

yuchenlin commited on Nov 23, 2023

Commit

e066c87

•

1 Parent(s): 90f9aa4

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -34,13 +34,14 @@ pipeline_tag: text-generation
 Pairwise Reward Model (PairRM) takes an instruction and a **pair** of output candidates as the input,
 and output a score for each candidate to measure their **relative** quality.
-Unlike the other RMs that encode and score each candidate respectively,
-PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
 PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
 PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
 Apart from that, one can also use PairRM to further align instruction-tuned LLMs with RLHF methods.
 PairRM is part of the LLM-Blender project (ACL 2023). Please see our paper linked above to know more.

 Pairwise Reward Model (PairRM) takes an instruction and a **pair** of output candidates as the input,
 and output a score for each candidate to measure their **relative** quality.
 PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
 PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
 Apart from that, one can also use PairRM to further align instruction-tuned LLMs with RLHF methods.
+Unlike the other RMs that encode and score each candidate respectively,
+PairRM takes a pair of candidates and compares them side-by-side to indentify the subtle differences between them.
+Also, PairRM is based on DeBERTa-large, and thus it is super efficient: 0.4B.
+We trained PairRM on a diverse collection of human preference datasets such as UltraFeedback, HH-RLHF, chatbot-arena, etc.
 PairRM is part of the LLM-Blender project (ACL 2023). Please see our paper linked above to know more.