PEFT
Safetensors
mixtral
alignment-handbook
trl
dpo
Generated from Trainer
4-bit precision
bitsandbytes
stealth-finance-v2-dpo-adapter / train_results.json
jan-hq's picture
Model save
c7f70d3 verified
{
"epoch": 1.0,
"train_loss": 0.3931771684165408,
"train_runtime": 248473.0607,
"train_samples": 209976,
"train_samples_per_second": 0.845,
"train_steps_per_second": 0.013
}