nvidia
/

Llama-3.1-Nemotron-70B-Reward

NeMo

English

nvidia

llama3.1

reward model

Model card Files Files and versions Community

zhilinw commited on Sep 28, 2024

Commit

dd48622

verified ·

1 Parent(s): 06fb6fa

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -1

README.md CHANGED Viewed

@@ -23,8 +23,37 @@ Llama-3.1-Nemotron-70B-Reward is a large language model customized using develop
 By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the [license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/)
-## Usage:
 You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).

 By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the [license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/)
+ | Model  | Type of Data Used For Training |  Overall | Chat | Chat-Hard | Safety | Reasoning |
+|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
+| _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.8 | **95.1** | **98.1** |
+| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8  |  95.8  |  **91.4**  |  91.9  |  96.1 |
+| TextEval-Llama3.1-70B  | Not disclosed  | 93.5  |  94.1  |  90.1  |  93.2  |  96.4 |
+| Skywork-Critic-Llama-3.1-70B  | Not fully disclosed |  93.3  |  96.6  |  87.9 |  93.1  |  95.5 |
+| SFR-LLaMa-3.1-70B-Judge-r  | Not fully disclosed  | 92.7  |  96.9  |  84.8  |  91.6  |  97.6
+  | Nemotron-4-340B-Reward  | Permissive Licensed Data Only (CC-BY-4.0) | 92.0  | 95.8 |   87.1 | 91.5  | 93.7 |
+  | ArmoRM-Llama3-8B-v0.1 | Includes GPT4 Generated Data |  90.8 | 96.9     | 76.8  | 92.2 | 97.3  |
+  | Cohere May 2024   | Not disclosed |   89.5  | 96.4     | 71.3      | 92.7 | 97.7  |
+  | Llama3-70B-SteerLM-RM  | Permissive Licensed Data Only (CC-BY-4.0) | 88.8  | 91.3 |   80.3 | 92.8  | 90.7 |
+  | Google Gemini Pro 1.5 | Not disclosed |  88.1 | 92.3  | 80.6 | 87.5  | 92.0  |
+  | GPT-4o-2024-08-06 |Not disclosed | 86.7 | 96.1 | 76.1 | 88.1 | 86.6 |
+  | claude-3-5-sonnet-20240620 | Not disclosed | 84.2 | 96.4 | 74.0 | 81.6 | 84.7 |
+  | Meta-Llama-3.1-70B-Instruct | Not fully disclosed | 84.0 | 97.2 | 70.2 | 82.8 | 86.0 |
+As shown above, Llama-3.1-Nemotron-70B-Reward performs best overall as well as in Chat, Safety and Reasoning category.
+To better understand why it struggles in Chat-Hard category, we analyzed the scores for each consistutent subset of Chat-Hard category.
+ | Model  | Type of Data Used For Training |  Overall | Chat | Chat-Hard | Safety | Reasoning |
+|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
+| _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.8 | **95.1** | **98.1** |
+| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8  |  95.8  |  **91.4**  |  91.9  |  96.1 |
+Last updated: 27 Sept 2024
+## Usage:
 You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).