yunconglong
commited on
Commit
•
9309ca4
1
Parent(s):
c053b4c
Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@
|
|
3 |
---
|
4 |
|
5 |
|
6 |
-
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with
|
7 |
```
|
8 |
DPO Trainer
|
9 |
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
|
|
|
3 |
---
|
4 |
|
5 |
|
6 |
+
* [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with jondurbin/truthy-dpo-v0.1
|
7 |
```
|
8 |
DPO Trainer
|
9 |
TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
|