tlc4418 commited on
Commit
0aa9475
1 Parent(s): c9d8288

Create README.md

Browse files

1.4b Pythia model after SFT on the AlpacaFarm dataset 'sft' split.

Policy model from '[Reward Model Ensembles Mitigate Overoptimization](https://arxiv.org/abs/2310.02743)'

Files changed (1) hide show
  1. README.md +4 -0
README.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - tatsu-lab/alpaca_farm
4
+ ---