allenai
/

tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Jun 13

Commit

754fedb

•

1 Parent(s): c9c93e5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -52,7 +52,7 @@ For details on training and evaluation, read [our paper](https://link.todo)!
 | **Tulu V2.5 PPO 13B (this model)** | 13B | PPO with 70B RM | 58.0 | **26.7** | 62.8 |
 | **Tulu V2 DPO 13B** | 13B | DPO | 50.5 | 16.0 | 61.0 |
 | **Tulu V2 SFT 13B** | 13B | - | 46.0 | 10.4 | 62.8 |
-| **Tulu V2 DPO 70B** | 13B | DPO | **71.5** | 21.2 | **69.4** |
 ## Input Format

 | **Tulu V2.5 PPO 13B (this model)** | 13B | PPO with 70B RM | 58.0 | **26.7** | 62.8 |
 | **Tulu V2 DPO 13B** | 13B | DPO | 50.5 | 16.0 | 61.0 |
 | **Tulu V2 SFT 13B** | 13B | - | 46.0 | 10.4 | 62.8 |
+| **Tulu V2 DPO 70B** | 70B | DPO | **71.5** | 21.2 | **69.4** |
 ## Input Format