Update README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,7 @@
|
|
1 |
---
|
2 |
datasets:
|
3 |
- tatsu-lab/alpaca_farm
|
4 |
-
---
|
|
|
|
|
|
|
|
1 |
---
|
2 |
datasets:
|
3 |
- tatsu-lab/alpaca_farm
|
4 |
+
---
|
5 |
+
1.4b Pythia model after SFT on the AlpacaFarm dataset 'sft' split.
|
6 |
+
|
7 |
+
Policy model from '[Reward Model Ensembles Mitigate Overoptimization](https://arxiv.org/abs/2310.02743)'
|