NeMo
English
nvidia
llama3.1
reward model
zhilinw commited on
Commit
31c8d37
·
verified ·
1 Parent(s): 6122111

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -55,7 +55,7 @@ As of 27 Sept 2024, Llama-3.1-Nemotron-70B-Reward performs best Overall on Rewar
55
 
56
 
57
  To better understand why Llama-3.1-Nemotron-70B-Reward does less well in the Chat-Hard category, we analyze the scores for each consistutent subset under the Chat-Hard category. We find that on categories that uses human annotations as ground truth, Llama-3.1-Nemotron-70B-Reward performs similar to Skywork-Reward-Gemma-2-27B (<= 2.2% difference).
58
- On the other hand, when GPT-4 annotations are used as Ground-Truth, Llama-3.1-Nemotron-70B-Reward trails substantially behind Skywork-Reward-Gemma-2-27B (10.8 to 19.2%). This suggests that Skywork-Reward-Gemma-2-27B can better modelling GPT-4 preference (than human-annotated preferences), likely contributed by the inclusion of GPT-4 annotated training data used to train it found in the [OffSetBias dataset](https://huggingface.co/datasets/NCSOFT/offsetbias) as part of the [Skywork-Reward-Preference-80k](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1).
59
 
60
 
61
 
 
55
 
56
 
57
  To better understand why Llama-3.1-Nemotron-70B-Reward does less well in the Chat-Hard category, we analyze the scores for each consistutent subset under the Chat-Hard category. We find that on categories that uses human annotations as ground truth, Llama-3.1-Nemotron-70B-Reward performs similar to Skywork-Reward-Gemma-2-27B (<= 2.2% difference).
58
+ On the other hand, when GPT-4 annotations are used as Ground-Truth, Llama-3.1-Nemotron-70B-Reward trails substantially behind Skywork-Reward-Gemma-2-27B (by 10.8 to 19.2%). This suggests that Skywork-Reward-Gemma-2-27B can better modelling GPT-4 preferences (but not human-annotated preferences), likely contributed by the inclusion of GPT-4 annotated training data used to train it found in the [OffSetBias dataset](https://huggingface.co/datasets/NCSOFT/offsetbias) as part of the [Skywork-Reward-Preference-80k](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.1).
59
 
60
 
61