NeMo
English
nvidia
llama3.1
reward model
zhilinw commited on
Commit
8ac3d10
·
verified ·
1 Parent(s): 7d174e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -26,7 +26,7 @@ By accessing this model, you are agreeing to the LLama 3.1 terms and conditions
26
 
27
  ## RewardBench Primary Dataset LeaderBoard
28
 
29
- Llama-3.1-Nemotron-70B-Reward performs best Overall on RewardBench as well as in Chat, Safety and Reasoning category.
30
 
31
  | Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
32
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
@@ -57,8 +57,6 @@ On the other hand, when GPT-4 annotations are used as Ground-Truth, we trail sub
57
  | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
58
 
59
 
60
- Last updated: 27 Sept 2024
61
-
62
  ## Usage:
63
 
64
  You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
 
26
 
27
  ## RewardBench Primary Dataset LeaderBoard
28
 
29
+ As of 27 Sept 2024, Llama-3.1-Nemotron-70B-Reward performs best Overall on RewardBench as well as in Chat, Safety and Reasoning category.
30
 
31
  | Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
32
  |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
 
57
  | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
58
 
59
 
 
 
60
  ## Usage:
61
 
62
  You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).