dpo

This model is a fine-tuned version of /leonardo_scratch/fast/IscrC_ItaLLM_0/tweety_models/sft on the giux78/ultrafeedback-binarized-preferences-cleaned-ita dataset. It achieves the following results on the evaluation set:

Loss: 0.6931
Rewards/chosen: -0.0430
Rewards/rejected: -0.0430
Rewards/accuracies: 0.0
Rewards/margins: 0.0
Logps/rejected: -310.7832
Logps/chosen: -310.7832
Logits/rejected: -2.3909
Logits/chosen: -2.3909

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.6931	0.0292	100	-2.3941	-2.3941	-306.3899	-306.3899	0.6931	0.0	0.0009	0.0	0.0009
0.6931	0.0584	200	-2.3946	-2.3946	-306.5539	-306.5539	0.6931	0.0	-0.0008	0.0	-0.0008
0.6931	0.0876	300	-2.3942	-2.3942	-307.0490	-307.0490	0.6931	0.0	-0.0057	0.0	-0.0057
0.6931	0.1168	400	-2.3940	-2.3940	-307.3796	-307.3796	0.6931	0.0	-0.0090	0.0	-0.0090
0.6931	0.1460	500	-2.3937	-2.3937	-307.1581	-307.1581	0.6931	0.0	-0.0068	0.0	-0.0068
0.6931	0.1751	600	-2.3950	-2.3950	-306.9631	-306.9631	0.6931	0.0	-0.0048	0.0	-0.0048
0.6931	0.2043	700	-2.3949	-2.3949	-307.6349	-307.6349	0.6931	0.0	-0.0116	0.0	-0.0116
0.6931	0.2335	800	-2.3947	-2.3947	-307.6957	-307.6957	0.6931	0.0	-0.0122	0.0	-0.0122
0.6931	0.2627	900	-2.3968	-2.3968	-307.1708	-307.1708	0.6931	0.0	-0.0069	0.0	-0.0069
0.6931	0.2919	1000	-2.3967	-2.3967	-308.2130	-308.2130	0.6931	0.0	-0.0173	0.0	-0.0173
0.6931	0.3211	1100	-2.3971	-2.3971	-309.4724	-309.4724	0.6931	0.0	-0.0299	0.0	-0.0299
0.6931	0.3503	1200	-2.3976	-2.3976	-310.0194	-310.0194	0.6931	0.0	-0.0354	0.0	-0.0354
0.6931	0.3795	1300	-2.3963	-2.3963	-309.5114	-309.5114	0.6931	0.0	-0.0303	0.0	-0.0303
0.6931	0.4087	1400	-2.3955	-2.3955	-309.2061	-309.2061	0.6931	0.0	-0.0273	0.0	-0.0273
0.6931	0.4379	1500	-2.3943	-2.3943	-308.9652	-308.9652	0.6931	0.0	-0.0249	0.0	-0.0249
0.6931	0.4671	1600	-2.3954	-2.3954	-309.1586	-309.1586	0.6931	0.0	-0.0268	0.0	-0.0268
0.6931	0.4962	1700	-2.3913	-2.3913	-309.4055	-309.4055	0.6931	0.0	-0.0293	0.0	-0.0293
0.6931	0.5254	1800	-2.3927	-2.3927	-310.2643	-310.2643	0.6931	0.0	-0.0379	0.0	-0.0379
0.6931	0.5546	1900	-2.3927	-2.3927	-310.4164	-310.4164	0.6931	0.0	-0.0394	0.0	-0.0394
0.6931	0.5838	2000	-2.3920	-2.3920	-310.4427	-310.4427	0.6931	0.0	-0.0396	0.0	-0.0396
0.6931	0.6130	2100	-2.3901	-2.3901	-310.7150	-310.7150	0.6931	0.0	-0.0424	0.0	-0.0424
0.6931	0.6422	2200	-2.3911	-2.3911	-311.0310	-311.0310	0.6931	0.0	-0.0455	0.0	-0.0455
0.6931	0.6714	2300	-2.3912	-2.3912	-310.7881	-310.7881	0.6931	0.0	-0.0431	0.0	-0.0431
0.6931	0.7006	2400	-2.3899	-2.3899	-310.6455	-310.6455	0.6931	0.0	-0.0417	0.0	-0.0417
0.6931	0.7298	2500	-2.3915	-2.3915	-310.8196	-310.8196	0.6931	0.0	-0.0434	0.0	-0.0434
0.6931	0.7590	2600	0.6931	-0.0438	-0.0438	0.0	0.0	-310.8546	-310.8546	-2.3919	-2.3919
0.6931	0.7881	2700	0.6931	-0.0436	-0.0436	0.0	0.0	-310.8407	-310.8407	-2.3916	-2.3916
0.6931	0.8173	2800	0.6931	-0.0432	-0.0432	0.0	0.0	-310.7981	-310.7981	-2.3915	-2.3915
0.6931	0.8465	2900	0.6931	-0.0432	-0.0432	0.0	0.0	-310.7943	-310.7943	-2.3920	-2.3920
0.6931	0.8757	3000	0.6931	-0.0431	-0.0431	0.0	0.0	-310.7866	-310.7866	-2.3918	-2.3918
0.6931	0.9049	3100	0.6931	-0.0430	-0.0430	0.0	0.0	-310.7794	-310.7794	-2.3908	-2.3908
0.6931	0.9341	3200	0.6931	-0.0430	-0.0430	0.0	0.0	-310.7812	-310.7812	-2.3911	-2.3911
0.6931	0.9633	3300	0.6931	-0.0430	-0.0430	0.0	0.0	-310.7767	-310.7767	-2.3915	-2.3915
0.6931	0.9925	3400	0.6931	-0.0430	-0.0430	0.0	0.0	-310.7832	-310.7832	-2.3909	-2.3909

Framework versions

PEFT 0.7.1
Transformers 4.40.2
Pytorch 2.1.2+cu121
Datasets 2.19.1
Tokenizers 0.19.1

g8a9
/

tweety-mistral-7b-dpo

dpo

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for g8a9/tweety-mistral-7b-dpo

Dataset used to train g8a9/tweety-mistral-7b-dpo

Evaluation results