yunconglong commited on
Commit
9309ca4
1 Parent(s): c053b4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -3,7 +3,7 @@
3
  ---
4
 
5
 
6
- * [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with the first 50 case of dataset jondurbin/truthy-dpo-v0.1
7
  ```
8
  DPO Trainer
9
  TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.
 
3
  ---
4
 
5
 
6
+ * [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) with jondurbin/truthy-dpo-v0.1
7
  ```
8
  DPO Trainer
9
  TRL supports the DPO Trainer for training language models from preference data, as described in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model by Rafailov et al., 2023.