zephyr-7b-dpo-full-ultrabin-low-margin-3-epochs

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.6812
Rewards/chosen: -1.9326
Rewards/rejected: -2.2832
Rewards/accuracies: 0.6797
Rewards/margins: 0.3506
Logps/rejected: -490.9798
Logps/chosen: -455.8898
Logits/rejected: -0.1744
Logits/chosen: -0.3072

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 55
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6871	0.3484	50	0.6798	-0.0015	-0.0300	0.5977	0.0285	-265.6574	-262.7750	-2.5993	-2.6328
0.6724	0.6969	100	0.6721	-0.0365	-0.0949	0.5938	0.0584	-272.1548	-266.2806	-2.4994	-2.5340
0.6047	1.0453	150	0.6797	-0.1660	-0.2332	0.5898	0.0673	-285.9855	-279.2270	-2.5025	-2.5443
0.5265	1.3937	200	0.6762	-0.5743	-0.7331	0.6719	0.1588	-335.9708	-320.0576	-2.2718	-2.3328
0.4984	1.7422	250	0.6732	-1.2121	-1.4445	0.6562	0.2325	-407.1154	-383.8381	-1.4451	-1.5433
0.3569	2.0906	300	0.6527	-1.3455	-1.6681	0.6758	0.3226	-429.4680	-397.1805	-0.8708	-0.9999
0.3329	2.4390	350	0.6840	-1.9045	-2.2570	0.6602	0.3525	-488.3670	-453.0816	-0.1084	-0.2447
0.3368	2.7875	400	0.6813	-1.9317	-2.2848	0.6797	0.3531	-491.1398	-455.8003	-0.1808	-0.3104

Framework versions

Transformers 4.44.0.dev0
Pytorch 2.1.2
Datasets 2.20.0
Tokenizers 0.19.1

sfulay
/

zephyr-7b-dpo-full-ultrabin-low-margin-3-epochs

zephyr-7b-dpo-full-ultrabin-low-margin-3-epochs

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sfulay/zephyr-7b-dpo-full-ultrabin-low-margin-3-epochs

Dataset used to train sfulay/zephyr-7b-dpo-full-ultrabin-low-margin-3-epochs

Evaluation results