Edit model card

Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V6

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9591
  • Rewards/chosen: -2.8498
  • Rewards/rejected: -3.2567
  • Rewards/accuracies: 0.6000
  • Rewards/margins: 0.4069
  • Logps/rejected: -145.1960
  • Logps/chosen: -111.9480
  • Logits/rejected: 0.1780
  • Logits/chosen: 0.1994

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6832 0.3007 69 0.6916 -0.1597 -0.1889 0.4000 0.0292 -114.5179 -85.0475 0.6233 0.6414
0.7529 0.6013 138 0.6560 -0.2047 -0.3472 0.5 0.1425 -116.1010 -85.4976 0.6177 0.6354
0.693 0.9020 207 0.6636 0.0291 -0.0598 0.5 0.0889 -113.2271 -83.1593 0.6143 0.6331
0.4049 1.2026 276 0.6820 -0.9628 -1.4793 0.5 0.5166 -127.4224 -93.0781 0.5148 0.5312
0.3698 1.5033 345 0.6524 -1.3282 -1.9360 0.6000 0.6078 -131.9892 -96.7321 0.4151 0.4326
0.3176 1.8039 414 0.7491 -1.8527 -2.3707 0.6000 0.5180 -136.3361 -101.9771 0.3469 0.3652
0.361 2.1046 483 0.8110 -2.2972 -2.7632 0.5 0.4660 -140.2609 -106.4225 0.2734 0.2932
0.3286 2.4052 552 0.9465 -2.7604 -3.1816 0.6000 0.4212 -144.4454 -111.0542 0.1886 0.2099
0.0545 2.7059 621 0.9591 -2.8498 -3.2567 0.6000 0.4069 -145.1960 -111.9480 0.1780 0.1994

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.19.1
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V6

Adapter
(1090)
this model