zhilinw commited on
Commit
cfff9a6
1 Parent(s): 32c821a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -26,7 +26,7 @@ Llama-3.1-Nemotron-70B-Reward-HF has been converted from [Llama-3.1-Nemotron-70B
26
 
27
  Try hosted inference for free at [build.nvidia.com](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-reward) - it comes with an OpenAI-compatible API interface and simply signing up gets you 100k free API calls to this model.
28
 
29
- Using this reward model for RLHF (specifically, REINFORCE), we were able to tune a Llama-3.1-70B-Instruct model to reach [AlpacaEval 2 LC](https://tatsu-lab.github.io/alpaca_eval/) of 57.6, [Arena Hard](https://github.com/lmarena/arena-hard-auto) of 85.0 and [GPT-4-Turbo MT-Bench](https://github.com/lm-sys/FastChat/pull/3158) of 8.98, which are known to be predictive of [LMSys Chatbot Arena Elo](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)
30
 
31
  As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks, edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.
32
 
 
26
 
27
  Try hosted inference for free at [build.nvidia.com](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-reward) - it comes with an OpenAI-compatible API interface and simply signing up gets you 100k free API calls to this model.
28
 
29
+ Using this reward model for RLHF (specifically, REINFORCE), we were able to tune a Llama-3.1-70B-Instruct model to reach [AlpacaEval 2 LC](https://tatsu-lab.github.io/alpaca_eval/) of 57.6, [Arena Hard](https://github.com/lmarena/arena-hard-auto) of 85.0 and [GPT-4-Turbo MT-Bench](https://github.com/lm-sys/FastChat/pull/3158) of 8.98, which are known to be predictive of [LMSys Chatbot Arena Elo](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) This Instruct model is available at [Llama-3.1-Nemotron-70B-Instruct](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct) as .nemo model and [Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF) as a HF Transformers model.
30
 
31
  As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks, edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.
32