jiazhengli
/

Pythia-2.8B-TLDR-Iterative-SamPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

J Li commited on Jun 17

Commit

28b2dbf

•

1 Parent(s): faa3cfe

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -15,10 +15,13 @@ license: apache-2.0
 This repository provides a fine-tuned version of Pythia-2.8B, using our proposed [SamPO](https://github.com/LuJunru/SamPO) algorithm: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence.
 ## Performance
-| Pairwise Comparison | GPT-4 win rate |
-| ----- | ------ |
-| Pythia-2.8B-TLDR-Iterative-SamPO Vs DPO | 78.66% |
 ## Evaluation Details
 We test our model with the same GPT-4 Win rate prompt template proposed by the [DPO paper](https://arxiv.org/pdf/2305.18290). The [sampled test set](https://huggingface.co/robinlee99/Pythia-2.8B-TLDR-Iterative-SamPO/blob/main/test_tldr.jsonl) is included in this repo.

 This repository provides a fine-tuned version of Pythia-2.8B, using our proposed [SamPO](https://github.com/LuJunru/SamPO) algorithm: Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence.
 ## Performance
+| vs. SFT | wins | len / token |
+| ------ | ------ | ------ |
+| DPO | 60.98 | 53.8 |
+| Iterative DPO	| **73.58** | 66.65 |
+| Length Normed DPO	| 58.13 | 47.34 |
+| SimPO | 33.33 | **31.9** |
+| Iterative SamPO | **73.58** | 49.54 |
 ## Evaluation Details
 We test our model with the same GPT-4 Win rate prompt template proposed by the [DPO paper](https://arxiv.org/pdf/2305.18290). The [sampled test set](https://huggingface.co/robinlee99/Pythia-2.8B-TLDR-Iterative-SamPO/blob/main/test_tldr.jsonl) is included in this repo.