Update README.md
Browse files
README.md
CHANGED
@@ -23,8 +23,37 @@ Llama-3.1-Nemotron-70B-Reward is a large language model customized using develop
|
|
23 |
|
24 |
By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the [license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/)
|
25 |
|
26 |
-
## Usage:
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
|
30 |
|
|
|
23 |
|
24 |
By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the [license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/)
|
25 |
|
|
|
26 |
|
27 |
+
| Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
|
28 |
+
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
29 |
+
| _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.8 | **95.1** | **98.1** |
|
30 |
+
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
|
31 |
+
| TextEval-Llama3.1-70B | Not disclosed | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
|
32 |
+
| Skywork-Critic-Llama-3.1-70B | Not fully disclosed | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
|
33 |
+
| SFR-LLaMa-3.1-70B-Judge-r | Not fully disclosed | 92.7 | 96.9 | 84.8 | 91.6 | 97.6
|
34 |
+
| Nemotron-4-340B-Reward | Permissive Licensed Data Only (CC-BY-4.0) | 92.0 | 95.8 | 87.1 | 91.5 | 93.7 |
|
35 |
+
| ArmoRM-Llama3-8B-v0.1 | Includes GPT4 Generated Data | 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
|
36 |
+
| Cohere May 2024 | Not disclosed | 89.5 | 96.4 | 71.3 | 92.7 | 97.7 |
|
37 |
+
| Llama3-70B-SteerLM-RM | Permissive Licensed Data Only (CC-BY-4.0) | 88.8 | 91.3 | 80.3 | 92.8 | 90.7 |
|
38 |
+
| Google Gemini Pro 1.5 | Not disclosed | 88.1 | 92.3 | 80.6 | 87.5 | 92.0 |
|
39 |
+
| GPT-4o-2024-08-06 |Not disclosed | 86.7 | 96.1 | 76.1 | 88.1 | 86.6 |
|
40 |
+
| claude-3-5-sonnet-20240620 | Not disclosed | 84.2 | 96.4 | 74.0 | 81.6 | 84.7 |
|
41 |
+
| Meta-Llama-3.1-70B-Instruct | Not fully disclosed | 84.0 | 97.2 | 70.2 | 82.8 | 86.0 |
|
42 |
+
|
43 |
+
|
44 |
+
As shown above, Llama-3.1-Nemotron-70B-Reward performs best overall as well as in Chat, Safety and Reasoning category.
|
45 |
+
|
46 |
+
To better understand why it struggles in Chat-Hard category, we analyzed the scores for each consistutent subset of Chat-Hard category.
|
47 |
+
|
48 |
+
| Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
|
49 |
+
|:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
|
50 |
+
| _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.8 | **95.1** | **98.1** |
|
51 |
+
| Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
|
52 |
+
|
53 |
+
|
54 |
+
Last updated: 27 Sept 2024
|
55 |
+
|
56 |
+
## Usage:
|
57 |
|
58 |
You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
|
59 |
|