Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ By accessing this model, you are agreeing to the LLama 3.1 terms and conditions
|
|
26 |
|
27 |
## RewardBench Primary Dataset LeaderBoard
|
28 |
|
29 |
-
Llama-3.1-Nemotron-70B-Reward performs best Overall on RewardBench as well as in Chat, Safety and Reasoning category.
|
30 |
|
31 |
| Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
|
32 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
@@ -57,8 +57,6 @@ On the other hand, when GPT-4 annotations are used as Ground-Truth, we trail sub
|
|
57 |
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
|
58 |
|
59 |
|
60 |
-
Last updated: 27 Sept 2024
|
61 |
-
|
62 |
## Usage:
|
63 |
|
64 |
You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
|
|
|
26 |
|
27 |
## RewardBench Primary Dataset LeaderBoard
|
28 |
|
29 |
+
As of 27 Sept 2024, Llama-3.1-Nemotron-70B-Reward performs best Overall on RewardBench as well as in Chat, Safety and Reasoning category.
|
30 |
|
31 |
| Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
|
32 |
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
|
|
57 |
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data | 91.4 | 78.3 | 89.6 | 96.0 | 97.8 | 91.5 | 86.5|
|
58 |
|
59 |
|
|
|
|
|
60 |
## Usage:
|
61 |
|
62 |
You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
|