|
Reward Model,Preference Datasets Used |
|
RLHFlow/ArmoRM-Llama3-8B-v0.1,"HelpSteer, UltraFeedback, BeaverTails, Argilla-Capybara, Argilla-Math-Preferences, CodeUltraFeedback, Argilla-OpenOrca" |
|
RLHFlow/pair-preference-model-LLaMA3-8B,"Filtered HH-RLHF, SHP, HelpSteer, SafeRLHF-30k, UltraFeedback, UltraInteract, CodeUltraFeedback, Argilla-Math, OpenOrca, Capybara" |
|
sfairXC/FsfairX-LLaMA3-RM-v0.1,"Filtered HH-RLHF, SHP, HelpSteer, SafeRLHF-30k, UltraFeedback, UltraInteract, CodeUltraFeedback, Argilla-Math, OpenOrca, Capybara" |
|
openbmb/Eurus-RM-7b,"UltraInteract, UltraFeedback, UltrSafety" |
|
Nexusflow/Starling-RM-34B,Nectar |
|
weqweasdas/RM-Mistral-7B,"HH-RLHF, Capybara, Orca, SHP, UltraFeedback, HelpSetter, PKU-SafeRLHF, PKU-SafeRLHF-30k" |
|
hendrydong/Mistral-RM-for-RAFT-GSHF-v0,Undisclosed |
|
stabilityai/stablelm-2-12b-chat,"HH-RLHF, argilla/dpo-mix-7k, and other Undisclosed" |
|
Ray2333/reward-model-Mistral-7B-instruct...,"Summarize, WebGPT, Dahoas/instruct-synthetic-prompt-responses, HH-RLHF, ChatBotArena Conversations, UltraFeedback, Nectar" |
|
allenai/tulu-2-dpo-70b,UltraFeedback |
|
meta-llama/Meta-Llama-3-70B-Instruct,Undisclosed |
|
prometheus-eval/prometheus-8x7b-v2.0,Preference Collction (relabeled mix) |
|
NousResearch/Nous-Hermes-2-Mistral-7B-DPO,Undisclosed |
|
mistralai/Mixtral-8x7B-Instruct-v0.1,Undisclosed |
|
upstage/SOLAR-10.7B-Instruct-v1.0,"OpenOrca, Intel-Orca, UltraFeedback" |
|
HuggingFaceH4/zephyr-7b-alpha,UltraFeedback |
|
allenai/tulu-2-dpo-13b,UltraFeedback |
|
0-hero/Matter-0.1-7B-boost-DPO-preview,Undisclosed |
|
prometheus-eval/prometheus-7b-v2.0,Preference Collction (relabeled mix) |
|
HuggingFaceH4/starchat2-15b-v0.1,"UltraFeedback, Orca" |
|
HuggingFaceH4/zephyr-7b-beta,UltraFeedback |
|
allenai/tulu-2-dpo-7b,UltraFeedback |
|
jondurbin/bagel-dpo-34b-v0.5,"Airoboros 3.2, Contextual DPO, HelpSteer, Orca, Gutenberg-DPO, Python DPO, Toxic DPO, Truthy, UltraFeedback" |
|
berkeley-nest/Starling-RM-7B-alpha,Nectar |
|
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO,Undisclosed |
|
0-hero/Matter-0.1-7B-DPO-preview,Undisclosed |
|
stabilityai/stablelm-zephyr-3b,"UltraFeedback, Orca" |
|
Qwen/Qwen1.5-14B-Chat,Undisclosed |
|
CohereForAI/c4ai-command-r-plus,Undisclosed |
|
OpenAssistant/oasst-rm-2.1-pythia-1.4b-epoch-2.5,"WebGPT, HH-RLHF, SHP, WebGPT, Summarize" |
|
Qwen/Qwen1.5-7B-Chat,Undisclosed |
|
weqweasdas/RM-Gemma-7B,"HH-RLHF, SHP, UltraFeedback, Capybara, HelpSteer, Orca" |
|
openbmb/Eurus-7b-kto,"UltraInteract, UltraFeedback" |
|
Qwen/Qwen1.5-72B-Chat,Undisclosed |
|
openbmb/UltraRM-13b,"UltraFeedback, HH-RLHF, SHP, Summarize" |
|
|