tlc4418 commited on
Commit
7b927d2
1 Parent(s): 0aa9475

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -1,4 +1,7 @@
1
  ---
2
  datasets:
3
  - tatsu-lab/alpaca_farm
4
- ---
 
 
 
 
1
  ---
2
  datasets:
3
  - tatsu-lab/alpaca_farm
4
+ ---
5
+ 1.4b Pythia model after SFT on the AlpacaFarm dataset 'sft' split.
6
+
7
+ Policy model from '[Reward Model Ensembles Mitigate Overoptimization](https://arxiv.org/abs/2310.02743)'