Pythia-70m supervised finetuned with Anthropic-hh-rlhf dataset for 1 epoch (sft-model), before DPO (paper) with same dataset for 1 epoch.

wandb log

Benchmark evaluations included in repo done using lm-evaluation-harness.

See Pythia-70m for original model details (paper).

Downloads last month
29
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train lomahony/eleuther-pythia2.8b-hh-dpo

Collection including lomahony/eleuther-pythia2.8b-hh-dpo