hendrydong
commited on
Commit
•
89579a9
1
Parent(s):
06f806b
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
license: cc-by-
|
3 |
---
|
4 |
|
5 |
This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
|
|
|
1 |
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
---
|
4 |
|
5 |
This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
|