Edit model card

Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V5

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0059
  • Rewards/chosen: -1.9822
  • Rewards/rejected: -2.2494
  • Rewards/accuracies: 0.6000
  • Rewards/margins: 0.2673
  • Logps/rejected: -163.8624
  • Logps/chosen: -165.7420
  • Logits/rejected: -0.1662
  • Logits/chosen: -0.1805

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7395 0.3010 73 0.6468 0.0134 -0.0847 0.9000 0.0981 -142.2149 -145.7866 0.3794 0.3670
0.7285 0.6021 146 0.6128 0.0518 -0.1414 0.7000 0.1932 -142.7814 -145.4018 0.3432 0.3316
0.5488 0.9031 219 0.5896 0.0505 -0.2094 0.8000 0.2599 -143.4620 -145.4151 0.3212 0.3092
0.4181 1.2041 292 0.7451 -0.5895 -1.0121 0.7000 0.4226 -151.4888 -151.8154 0.2582 0.2463
0.6666 1.5052 365 0.6292 -0.4920 -0.8706 0.5 0.3786 -150.0739 -150.8403 0.2068 0.1950
0.5649 1.8062 438 0.6652 -0.6961 -1.0296 0.6000 0.3335 -151.6640 -152.8809 0.1043 0.0914
0.3129 2.1072 511 0.8072 -1.2644 -1.5342 0.6000 0.2698 -156.7100 -158.5638 0.0071 -0.0060
0.0785 2.4082 584 1.0289 -2.0249 -2.2745 0.6000 0.2496 -164.1127 -166.1691 -0.1558 -0.1700
0.1698 2.7093 657 1.0059 -1.9822 -2.2494 0.6000 0.2673 -163.8624 -165.7420 -0.1662 -0.1805

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.19.1
Downloads last month
23
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V5

Adapter
(1090)
this model