Text Generation
Transformers
Safetensors
English
deberta
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
yuchenlin commited on
Commit
94512ba
1 Parent(s): 345b1ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -28,15 +28,6 @@ Inspired by [DeBERTa Reward Model Series](https://huggingface.co/OpenAssistant/r
28
  - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
29
  - Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
30
 
31
-
32
- ## Statistics
33
-
34
- ### Context length
35
- | PairRanker type | Source max length | Candidate max length | Total max length |
36
- |:-----------------:|:-----------------:|----------------------|------------------|
37
- | [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) | 128 | 128 | 384 |
38
- | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
39
-
40
  ## Usage Example
41
 
42
  ### Installation
@@ -141,6 +132,18 @@ With a `blender.compare()` function, you can easily apply PairRM to poopular RLH
141
 
142
  Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LLM-Blender#rank-and-fusion)
143
 
 
 
 
 
 
 
 
 
 
 
 
 
144
  ### Performance
145
  PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
146
  with an extremly small model size (0.4B), approching the performance of GPT-4.
 
28
  - Paper: [https://arxiv.org/abs/2306.02561](https://arxiv.org/abs/2306.02561)
29
  - Space Demo: [https://huggingface.co/spaces/llm-blender/LLM-Blender](https://huggingface.co/spaces/llm-blender/LLM-Blender)
30
 
 
 
 
 
 
 
 
 
 
31
  ## Usage Example
32
 
33
  ### Installation
 
132
 
133
  Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LLM-Blender#rank-and-fusion)
134
 
135
+
136
+
137
+
138
+ ## Statistics
139
+
140
+ ### Context length
141
+ | PairRanker type | Source max length | Candidate max length | Total max length |
142
+ |:-----------------:|:-----------------:|----------------------|------------------|
143
+ | [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) | 128 | 128 | 384 |
144
+ | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224 | 412 | 2048 |
145
+
146
+
147
  ### Performance
148
  PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
149
  with an extremly small model size (0.4B), approching the performance of GPT-4.