IDEA-CCNL
/

Ziya-LLaMA-7B-Reward

@@ -33,6 +33,7 @@ with torch.no_grad():
     # reward: 0.76
 ```
 模型可以较为准确地判断文本重复，异常中断和不符合指令要求等低质量模型生成结果，并给出较低的奖励值。
 The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
 ```python
@@ -52,8 +53,11 @@ with torch.no_grad():
     print(reward.tolist())
     #reward: [0.76, -1.36, -2.99, -1.82]
 ```
 模型能够对比对同一指令的不同生成结果，并根据质量给出奖励值。
 The model is able to compare different generation results for the same instruction and give reward values based on quality.
 ```python
 prefix_user = "Human:"
 prefix_bot = "\n\nAssistant:"

     # reward: 0.76
 ```
 模型可以较为准确地判断文本重复，异常中断和不符合指令要求等低质量模型生成结果，并给出较低的奖励值。
 The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
 ```python
     print(reward.tolist())
     #reward: [0.76, -1.36, -2.99, -1.82]
 ```
 模型能够对比对同一指令的不同生成结果，并根据质量给出奖励值。
 The model is able to compare different generation results for the same instruction and give reward values based on quality.
 ```python
 prefix_user = "Human:"
 prefix_bot = "\n\nAssistant:"