dpo

This model is a fine-tuned version of /leonardo_scratch/fast/IscrC_ItaLLM_0/tweety_models/sft on the giux78/ultrafeedback-binarized-preferences-cleaned-ita dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6931
  • Rewards/chosen: -0.0430
  • Rewards/rejected: -0.0430
  • Rewards/accuracies: 0.0
  • Rewards/margins: 0.0
  • Logps/rejected: -310.7832
  • Logps/chosen: -310.7832
  • Logits/rejected: -2.3909
  • Logits/chosen: -2.3909

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6931 0.0292 100 -2.3941 -2.3941 -306.3899 -306.3899 0.6931 0.0 0.0009 0.0 0.0009
0.6931 0.0584 200 -2.3946 -2.3946 -306.5539 -306.5539 0.6931 0.0 -0.0008 0.0 -0.0008
0.6931 0.0876 300 -2.3942 -2.3942 -307.0490 -307.0490 0.6931 0.0 -0.0057 0.0 -0.0057
0.6931 0.1168 400 -2.3940 -2.3940 -307.3796 -307.3796 0.6931 0.0 -0.0090 0.0 -0.0090
0.6931 0.1460 500 -2.3937 -2.3937 -307.1581 -307.1581 0.6931 0.0 -0.0068 0.0 -0.0068
0.6931 0.1751 600 -2.3950 -2.3950 -306.9631 -306.9631 0.6931 0.0 -0.0048 0.0 -0.0048
0.6931 0.2043 700 -2.3949 -2.3949 -307.6349 -307.6349 0.6931 0.0 -0.0116 0.0 -0.0116
0.6931 0.2335 800 -2.3947 -2.3947 -307.6957 -307.6957 0.6931 0.0 -0.0122 0.0 -0.0122
0.6931 0.2627 900 -2.3968 -2.3968 -307.1708 -307.1708 0.6931 0.0 -0.0069 0.0 -0.0069
0.6931 0.2919 1000 -2.3967 -2.3967 -308.2130 -308.2130 0.6931 0.0 -0.0173 0.0 -0.0173
0.6931 0.3211 1100 -2.3971 -2.3971 -309.4724 -309.4724 0.6931 0.0 -0.0299 0.0 -0.0299
0.6931 0.3503 1200 -2.3976 -2.3976 -310.0194 -310.0194 0.6931 0.0 -0.0354 0.0 -0.0354
0.6931 0.3795 1300 -2.3963 -2.3963 -309.5114 -309.5114 0.6931 0.0 -0.0303 0.0 -0.0303
0.6931 0.4087 1400 -2.3955 -2.3955 -309.2061 -309.2061 0.6931 0.0 -0.0273 0.0 -0.0273
0.6931 0.4379 1500 -2.3943 -2.3943 -308.9652 -308.9652 0.6931 0.0 -0.0249 0.0 -0.0249
0.6931 0.4671 1600 -2.3954 -2.3954 -309.1586 -309.1586 0.6931 0.0 -0.0268 0.0 -0.0268
0.6931 0.4962 1700 -2.3913 -2.3913 -309.4055 -309.4055 0.6931 0.0 -0.0293 0.0 -0.0293
0.6931 0.5254 1800 -2.3927 -2.3927 -310.2643 -310.2643 0.6931 0.0 -0.0379 0.0 -0.0379
0.6931 0.5546 1900 -2.3927 -2.3927 -310.4164 -310.4164 0.6931 0.0 -0.0394 0.0 -0.0394
0.6931 0.5838 2000 -2.3920 -2.3920 -310.4427 -310.4427 0.6931 0.0 -0.0396 0.0 -0.0396
0.6931 0.6130 2100 -2.3901 -2.3901 -310.7150 -310.7150 0.6931 0.0 -0.0424 0.0 -0.0424
0.6931 0.6422 2200 -2.3911 -2.3911 -311.0310 -311.0310 0.6931 0.0 -0.0455 0.0 -0.0455
0.6931 0.6714 2300 -2.3912 -2.3912 -310.7881 -310.7881 0.6931 0.0 -0.0431 0.0 -0.0431
0.6931 0.7006 2400 -2.3899 -2.3899 -310.6455 -310.6455 0.6931 0.0 -0.0417 0.0 -0.0417
0.6931 0.7298 2500 -2.3915 -2.3915 -310.8196 -310.8196 0.6931 0.0 -0.0434 0.0 -0.0434
0.6931 0.7590 2600 0.6931 -0.0438 -0.0438 0.0 0.0 -310.8546 -310.8546 -2.3919 -2.3919
0.6931 0.7881 2700 0.6931 -0.0436 -0.0436 0.0 0.0 -310.8407 -310.8407 -2.3916 -2.3916
0.6931 0.8173 2800 0.6931 -0.0432 -0.0432 0.0 0.0 -310.7981 -310.7981 -2.3915 -2.3915
0.6931 0.8465 2900 0.6931 -0.0432 -0.0432 0.0 0.0 -310.7943 -310.7943 -2.3920 -2.3920
0.6931 0.8757 3000 0.6931 -0.0431 -0.0431 0.0 0.0 -310.7866 -310.7866 -2.3918 -2.3918
0.6931 0.9049 3100 0.6931 -0.0430 -0.0430 0.0 0.0 -310.7794 -310.7794 -2.3908 -2.3908
0.6931 0.9341 3200 0.6931 -0.0430 -0.0430 0.0 0.0 -310.7812 -310.7812 -2.3911 -2.3911
0.6931 0.9633 3300 0.6931 -0.0430 -0.0430 0.0 0.0 -310.7767 -310.7767 -2.3915 -2.3915
0.6931 0.9925 3400 0.6931 -0.0430 -0.0430 0.0 0.0 -310.7832 -310.7832 -2.3909 -2.3909

Framework versions

  • PEFT 0.7.1
  • Transformers 4.40.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for g8a9/tweety-mistral-7b-dpo

Adapter
(2)
this model

Dataset used to train g8a9/tweety-mistral-7b-dpo