llm-blender
/

PairRM

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Dongfu Jiang commited on Nov 28, 2023

Commit

8f55e3b

·

1 Parent(s): 9777535

Update README.md

Files changed (1) hide show

README.md +8 -9

README.md CHANGED Viewed

@@ -193,6 +193,13 @@ Learn more in our LLM-Blender Github [README.md](https://github.com/yuchenlin/LL
 | [pair-ranker](https://huggingface.co/llm-blender/pair-ranker)  (our previous version)             | 128               | 128                  | 384              |
 | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224              | 412                  | 2048             |
 ### Performance
 PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
@@ -203,15 +210,7 @@ We test the pairwise comparison on
 - [HHH-alignment](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment)
 - [MT-bench-human-judgements](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)
-### Training Datasets
-- [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
-- [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
-- [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses)
-- [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
-- [lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
-- [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)
 #### Auto-J Pairwise test data performance

 | [pair-ranker](https://huggingface.co/llm-blender/pair-ranker)  (our previous version)             | 128               | 128                  | 384              |
 | [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) (This model) | 1224              | 412                  | 2048             |
+### Training Datasets
+- [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
+- [openai/webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
+- [Dahoas/instruct-synthetic-prompt-responses](https://huggingface.co/datasets/Dahoas/instruct-synthetic-prompt-responses)
+- [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
+- [lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
+- [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback)
 ### Performance
 PairRM has been trained on various high-quality and large-scale dataset with human preference annotations and exhibits great correlation with human preferences
 - [HHH-alignment](https://huggingface.co/datasets/HuggingFaceH4/hhh_alignment)
 - [MT-bench-human-judgements](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)
+All following results are reported as pairwise comparison accuracies (agreements).
 #### Auto-J Pairwise test data performance