Text Generation
Transformers
Safetensors
English
deberta
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
yuchenlin commited on
Commit
447053d
1 Parent(s): c845907

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -14
README.md CHANGED
@@ -39,17 +39,19 @@ PairRM takes a pair of candidates and compares them side-by-side to indentify th
39
 
40
  PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
41
  PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
42
- Apart from that, one can also use PairRM to
 
 
43
 
44
 
45
  ## Installation
46
- Since PairRanker contains some custom layers and tokens. We recommend use PairRM with our llm-blender code API.
47
  - First install `llm-blender`
48
  ```bash
49
  pip install git+https://github.com/yuchenlin/LLM-Blender.git
50
  ```
51
 
52
- - Then load pairranker with the following code:
53
  ```python
54
  import llm_blender
55
  blender = llm_blender.Blender()
@@ -59,23 +61,31 @@ blender.loadranker("llm-blender/PairRM") # load PairRM
59
 
60
  ## Usage
61
 
62
- ### Use case 1: Compare responses (Quality Evaluator)
63
 
64
- - Then you can rank candidate responses with the following function
65
 
66
  ```python
67
- inputs = ["input1", "input2"]
68
- candidates_texts = [["candidate1 for input1", "candidatefor input1"], ["candidate1 for input2", "candidate2 for input2"]]
 
69
  ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=2)
70
  # ranks is a list of ranks where ranks[i][j] represents the ranks of candidate-j for input-i
 
 
 
 
 
71
  ```
72
 
73
- - Directly compare two candidate responses
74
  ```python
75
- candidates_A = [cands[0] for cands in candidates]
76
- candidates_B = [cands[1] for cands in candidates]
 
77
  comparison_results = blender.compare(inputs, candidates_A, candidates_B)
78
- # comparison_results is a list of bool, where element[i] denotes whether candidates_A[i] is better than candidates_B[i] for inputs[i]
 
79
  ```
80
 
81
  - Directly compare two multi-turn conversations given that user's query in each turn are fiexed and responses are different.
@@ -86,7 +96,7 @@ conv1 = [
86
  "role": "USER"
87
  },
88
  {
89
- "content": "<assistant response>",
90
  "role": "ASSISTANT"
91
  },
92
  ...
@@ -97,7 +107,7 @@ conv2 = [
97
  "role": "USER"
98
  },
99
  {
100
- "content": "<assistant response>",
101
  "role": "ASSISTANT"
102
  },
103
  ...
@@ -106,7 +116,7 @@ comparison_results = blender.compare_conversations([conv1], [conv2])
106
  # comparison_results is a list of bool, where each element denotes whether all the responses in conv1 together is better than that of conv2
107
  ```
108
 
109
- ### Use case 2: Best-of-n sampling (Decoding Enhancing)
110
  **Best-of-n Sampling**, aka, rejection sampling, is a strategy to enhance the response quality by selecting the one that was ranked highest by the reward model (Learn more at[OpenAI WebGPT section 3.2](https://arxiv.org/pdf/2112.09332.pdf) and [OpenAI Blog](https://openai.com/research/measuring-goodharts-law)).
111
 
112
  Best-of-n sampling is a easy way to imporve your llm power with just a few lines of code. An example of applying on zephyr is as follows.
 
39
 
40
  PairRM can be used to (re-)rank a list of candidate outputs and thus can be used an LLM evaluator to efficiently assess the quality of LLMs in local environment.
41
  PairRM can also be used to enhance the decoding by `best-of-n sampling` (i.e., reranking N sampled outputs).
42
+ Apart from that, one can also use PairRM to further align instruction-tuned LLMs with RLHF methods.
43
+
44
+ PairRM is part of the LLM-Blender project (ACL 2023). Please see our paper linked above to know more.
45
 
46
 
47
  ## Installation
48
+
49
  - First install `llm-blender`
50
  ```bash
51
  pip install git+https://github.com/yuchenlin/LLM-Blender.git
52
  ```
53
 
54
+ - Then load PairRM:
55
  ```python
56
  import llm_blender
57
  blender = llm_blender.Blender()
 
61
 
62
  ## Usage
63
 
64
+ ### Use case 1: Comparing/Ranking output candidates given an instruction
65
 
66
+ - Ranking a list candidate responses
67
 
68
  ```python
69
+ inputs = ["hello!", "I love you!"]
70
+ candidates_texts = [["get out!", "hi! nice to meet you!", "bye"],
71
+ ["I love you too!", "I hate you!", "Thanks! You're a good guy!"]]
72
  ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=2)
73
  # ranks is a list of ranks where ranks[i][j] represents the ranks of candidate-j for input-i
74
+ """
75
+ ranks -->
76
+ array([[3, 1, 2], # it means "hi! nice to meet you!" ranks the 1st, "bye" ranks the 2nd, and "get out!" ranks the 3rd.
77
+ [1, 3, 2]], # it means "I love you too"! ranks the the 1st, and "I hate you!" ranks the 3rd.
78
+ dtype=int32)
79
  ```
80
 
81
+ - Directly comparing two candidate responses
82
  ```python
83
+ inputs = ["hello!", "I love you!"]
84
+ candidates_A = ["hi!", "I hate you!"]
85
+ candidates_B = ["f**k off!", "I love you, too!"]
86
  comparison_results = blender.compare(inputs, candidates_A, candidates_B)
87
+ # comparison_results is a list of bool, where comparison_results[i] denotes whether candidates_A[i] is better than candidates_B[i] for inputs[i]
88
+ # comparison_results[0]--> True
89
  ```
90
 
91
  - Directly compare two multi-turn conversations given that user's query in each turn are fiexed and responses are different.
 
96
  "role": "USER"
97
  },
98
  {
99
+ "content": "<assistant1‘s response 1>",
100
  "role": "ASSISTANT"
101
  },
102
  ...
 
107
  "role": "USER"
108
  },
109
  {
110
+ "content": "<assistant2's response 1>",
111
  "role": "ASSISTANT"
112
  },
113
  ...
 
116
  # comparison_results is a list of bool, where each element denotes whether all the responses in conv1 together is better than that of conv2
117
  ```
118
 
119
+ ### Use case 2: Best-of-n Sampling (Decoding Enhancment)
120
  **Best-of-n Sampling**, aka, rejection sampling, is a strategy to enhance the response quality by selecting the one that was ranked highest by the reward model (Learn more at[OpenAI WebGPT section 3.2](https://arxiv.org/pdf/2112.09332.pdf) and [OpenAI Blog](https://openai.com/research/measuring-goodharts-law)).
121
 
122
  Best-of-n sampling is a easy way to imporve your llm power with just a few lines of code. An example of applying on zephyr is as follows.