nvidia
/

Llama-3.1-Nemotron-70B-Reward

Model card Files Files and versions Community

zhilinw commited on Sep 28, 2024

Commit

6122111

·

verified ·

1 Parent(s): 4213126

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -17,7 +17,16 @@ datasets:
 ## Description:
-Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses.
 ## Terms of use

 ## Description:
+Llama-3.1-Nemotron-70B-Reward is a large language model customized using developed by NVIDIA to predict the quality of LLM generated responses. Specifically, it has trained using a Llama-3.1-70B-Instruct Base on a novel approach combining the strength of Bradley Terry and SteerLM Regression Reward Modelling.
+Given a conversation with multiple turns between user and assistant, it rates the quality of the final assistant turn using a reward score.
+For the same prompt, a response with higher reward score has higher quality than another response with a lower reward score, but the same cannot be said when comparing the scores between responses to different prompts.
 ## Terms of use