chujiezheng
/

Mistral7B-PairRM-SPPO-ExPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chujiezheng commited on Jun 1, 2024

Commit

d3e8342

·

verified ·

1 Parent(s): dba98d9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ The extrapolated (ExPO) model based on [`UCLA-AGI/Mistral7B-PairRM-SPPO`](https:
 Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.
-This model achieves the **35.4%** win rate and **31.8%** LC win rate on **AlpacaEval 2.0**.
 ## Evaluation Results

 Specifically, we obtain this model by extrapolating **(alpha = 0.3)** from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.
+This extrapolated model achieves the **35.4%** win rate and **31.8%** LC win rate on **AlpacaEval 2.0**, outperforming the original `Mistral7B-PairRM-SPPO`'s 32.2% and 30.5%, respectively.
 ## Evaluation Results