NeMo
English
nvidia
llama3.1
reward model
zhilinw commited on
Commit
dd48622
1 Parent(s): 06fb6fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -23,8 +23,37 @@ Llama-3.1-Nemotron-70B-Reward is a large language model customized using develop
23
 
24
  By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the [license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/)
25
 
26
- ## Usage:
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
30
 
 
23
 
24
  By accessing this model, you are agreeing to the LLama 3.1 terms and conditions of the [license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE), [acceptable use policy](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/USE_POLICY.md) and [Meta’s privacy policy](https://www.facebook.com/privacy/policy/)
25
 
 
26
 
27
+ | Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
28
+ |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
29
+ | _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.8 | **95.1** | **98.1** |
30
+ | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
31
+ | TextEval-Llama3.1-70B | Not disclosed | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 |
32
+ | Skywork-Critic-Llama-3.1-70B | Not fully disclosed | 93.3 | 96.6 | 87.9 | 93.1 | 95.5 |
33
+ | SFR-LLaMa-3.1-70B-Judge-r | Not fully disclosed | 92.7 | 96.9 | 84.8 | 91.6 | 97.6
34
+ | Nemotron-4-340B-Reward | Permissive Licensed Data Only (CC-BY-4.0) | 92.0 | 95.8 | 87.1 | 91.5 | 93.7 |
35
+ | ArmoRM-Llama3-8B-v0.1 | Includes GPT4 Generated Data | 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
36
+ | Cohere May 2024 | Not disclosed | 89.5 | 96.4 | 71.3 | 92.7 | 97.7 |
37
+ | Llama3-70B-SteerLM-RM | Permissive Licensed Data Only (CC-BY-4.0) | 88.8 | 91.3 | 80.3 | 92.8 | 90.7 |
38
+ | Google Gemini Pro 1.5 | Not disclosed | 88.1 | 92.3 | 80.6 | 87.5 | 92.0 |
39
+ | GPT-4o-2024-08-06 |Not disclosed | 86.7 | 96.1 | 76.1 | 88.1 | 86.6 |
40
+ | claude-3-5-sonnet-20240620 | Not disclosed | 84.2 | 96.4 | 74.0 | 81.6 | 84.7 |
41
+ | Meta-Llama-3.1-70B-Instruct | Not fully disclosed | 84.0 | 97.2 | 70.2 | 82.8 | 86.0 |
42
+
43
+
44
+ As shown above, Llama-3.1-Nemotron-70B-Reward performs best overall as well as in Chat, Safety and Reasoning category.
45
+
46
+ To better understand why it struggles in Chat-Hard category, we analyzed the scores for each consistutent subset of Chat-Hard category.
47
+
48
+ | Model | Type of Data Used For Training | Overall | Chat | Chat-Hard | Safety | Reasoning |
49
+ |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
50
+ | _**Llama-3.1-Nemotron-70B-Reward**_ |Permissive Licensed Data Only (CC-BY-4.0) | **94.1** | **97.5** | 85.8 | **95.1** | **98.1** |
51
+ | Skywork-Reward-Gemma-2-27B | Includes GPT4 Generated Data| 93.8 | 95.8 | **91.4** | 91.9 | 96.1 |
52
+
53
+
54
+ Last updated: 27 Sept 2024
55
+
56
+ ## Usage:
57
 
58
  You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
59