NeMo
English
nvidia
llama3.1
reward model
zhilinw commited on
Commit
6122111
·
verified ·
1 Parent(s): 4213126

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -17,7 +17,16 @@ datasets:
17
 
18
  ## Description:
19
 
20
- Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses.
 
 
 
 
 
 
 
 
 
21
 
22
  ## Terms of use
23
 
 
17
 
18
  ## Description:
19
 
20
+ Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses. Specifically, it has trained using a Llama-3.1-70B-Instruct Base on a novel approach combining the strength of Bradley Terry and SteerLM Regression Reward Modelling.
21
+
22
+ Given a conversation with multiple turns between user and assistant, it rates the quality of the final assistant turn using a reward score.
23
+
24
+ For the same prompt, a response with higher reward score has higher quality than another response with a lower reward score, but the same cannot be said when comparing the scores between responses to different prompts.
25
+
26
+
27
+
28
+
29
+
30
 
31
  ## Terms of use
32