6fd424449d2a16ad785e7289c4a376277b3c94c6a13af7c05c8c78bd00507d2f
Browse files
README.md
CHANGED
@@ -33,6 +33,7 @@ with torch.no_grad():
|
|
33 |
# reward: 0.76
|
34 |
```
|
35 |
模型可以较为准确地判断文本重复,异常中断和不符合指令要求等低质量模型生成结果,并给出较低的奖励值。
|
|
|
36 |
The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
|
37 |
|
38 |
```python
|
@@ -52,8 +53,11 @@ with torch.no_grad():
|
|
52 |
print(reward.tolist())
|
53 |
#reward: [0.76, -1.36, -2.99, -1.82]
|
54 |
```
|
|
|
55 |
模型能够对比对同一指令的不同生成结果,并根据质量给出奖励值。
|
|
|
56 |
The model is able to compare different generation results for the same instruction and give reward values based on quality.
|
|
|
57 |
```python
|
58 |
prefix_user = "Human:"
|
59 |
prefix_bot = "\n\nAssistant:"
|
|
|
33 |
# reward: 0.76
|
34 |
```
|
35 |
模型可以较为准确地判断文本重复,异常中断和不符合指令要求等低质量模型生成结果,并给出较低的奖励值。
|
36 |
+
|
37 |
The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
|
38 |
|
39 |
```python
|
|
|
53 |
print(reward.tolist())
|
54 |
#reward: [0.76, -1.36, -2.99, -1.82]
|
55 |
```
|
56 |
+
|
57 |
模型能够对比对同一指令的不同生成结果,并根据质量给出奖励值。
|
58 |
+
|
59 |
The model is able to compare different generation results for the same instruction and give reward values based on quality.
|
60 |
+
|
61 |
```python
|
62 |
prefix_user = "Human:"
|
63 |
prefix_bot = "\n\nAssistant:"
|