Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,16 @@ datasets:
|
|
17 |
|
18 |
## Description:
|
19 |
|
20 |
-
Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
## Terms of use
|
23 |
|
|
|
17 |
|
18 |
## Description:
|
19 |
|
20 |
+
Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses. Specifically, it has trained using a Llama-3.1-70B-Instruct Base on a novel approach combining the strength of Bradley Terry and SteerLM Regression Reward Modelling.
|
21 |
+
|
22 |
+
Given a conversation with multiple turns between user and assistant, it rates the quality of the final assistant turn using a reward score.
|
23 |
+
|
24 |
+
For the same prompt, a response with higher reward score has higher quality than another response with a lower reward score, but the same cannot be said when comparing the scores between responses to different prompts.
|
25 |
+
|
26 |
+
|
27 |
+
|
28 |
+
|
29 |
+
|
30 |
|
31 |
## Terms of use
|
32 |
|