nvidia
/

Llama-3.1-Nemotron-70B-Reward-HF

Model card Files Files and versions Community

zhilinw commited on Oct 15

Commit

cfff9a6

•

1 Parent(s): 32c821a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ Llama-3.1-Nemotron-70B-Reward-HF has been converted from [Llama-3.1-Nemotron-70B
 Try hosted inference for free at [build.nvidia.com](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-reward) - it comes with an OpenAI-compatible API interface and simply signing up gets you 100k free API calls to this model.
-Using this reward model for RLHF (specifically, REINFORCE), we were able to tune a Llama-3.1-70B-Instruct model to reach [AlpacaEval 2 LC](https://tatsu-lab.github.io/alpaca_eval/) of 57.6, [Arena Hard](https://github.com/lmarena/arena-hard-auto) of 85.0 and [GPT-4-Turbo MT-Bench](https://github.com/lm-sys/FastChat/pull/3158) of 8.98, which are known to be predictive of [LMSys Chatbot Arena Elo](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
 As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks, edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

 Try hosted inference for free at [build.nvidia.com](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-reward) - it comes with an OpenAI-compatible API interface and simply signing up gets you 100k free API calls to this model.
+Using this reward model for RLHF (specifically, REINFORCE), we were able to tune a Llama-3.1-70B-Instruct model to reach [AlpacaEval 2 LC](https://tatsu-lab.github.io/alpaca_eval/) of 57.6, [Arena Hard](https://github.com/lmarena/arena-hard-auto) of 85.0 and [GPT-4-Turbo MT-Bench](https://github.com/lm-sys/FastChat/pull/3158) of 8.98, which are known to be predictive of [LMSys Chatbot Arena Elo](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) This Instruct model is available at [Llama-3.1-Nemotron-70B-Instruct](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct) as .nemo model and [Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF) as a HF Transformers model.
 As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks, edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.