Transformers
Safetensors
English
deberta-v2
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
maywell commited on
Commit
69f3ec1
1 Parent(s): 2fc8ad5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -1,5 +1,26 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  # Better Implementation for [*PairRM*](https://huggingface.co/llm-blender/PairRM)
5
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - openai/summarize_from_feedback
5
+ - openai/webgpt_comparisons
6
+ - berkeley-nest/Nectar
7
+ - Dahoas/instruct-synthetic-prompt-responses
8
+ - Anthropic/hh-rlhf
9
+ - lmsys/chatbot_arena_conversations
10
+ - openbmb/UltraFeedback
11
+ - argilla/ultrafeedback-binarized-preferences-cleaned
12
+ metrics:
13
+ - accuracy
14
+ tags:
15
+ - reward_model
16
+ - reward-model
17
+ - RLHF
18
+ - evaluation
19
+ - llm
20
+ - instruction
21
+ - reranking
22
+ language:
23
+ - en
24
  ---
25
  # Better Implementation for [*PairRM*](https://huggingface.co/llm-blender/PairRM)
26