Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task Version Metric Value Stderr
hellaswag 0 acc 0.6621 Β± 0.0047
acc_norm 0.8525 Β± 0.0035
arc_challenge 0 acc 0.6348 Β± 0.0141
acc_norm 0.6698 Β± 0.0137
winogrande 0 acc 0.7861 Β± 0.0115
gsm8k 0 acc 0.5694 Β± 0.0136
Downloads last month
1,583
Safetensors
Model size
7.24B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for chargoddard/loyal-piano-m7-cdpo

Merges
3 models
Quantizations
2 models

Dataset used to train chargoddard/loyal-piano-m7-cdpo

Spaces using chargoddard/loyal-piano-m7-cdpo 6

Collection including chargoddard/loyal-piano-m7-cdpo