Edit model card

Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8499
  • Rewards/chosen: -2.3527
  • Rewards/rejected: -2.7258
  • Rewards/accuracies: 0.5
  • Rewards/margins: 0.3731
  • Logps/rejected: -145.5276
  • Logps/chosen: -177.2292
  • Logits/rejected: -0.0232
  • Logits/chosen: -0.0429

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7354 0.3029 78 0.7015 -0.0064 0.0037 0.6667 -0.0101 -118.2320 -153.7661 0.5634 0.5426
0.6583 0.6058 156 0.7087 -0.0202 -0.0023 0.5833 -0.0178 -118.2927 -153.9037 0.5270 0.5061
0.723 0.9087 234 0.7499 -0.3620 -0.3783 0.5 0.0163 -122.0522 -157.3222 0.4964 0.4745
0.229 1.2117 312 0.7914 -0.9616 -1.0299 0.5833 0.0683 -128.5688 -163.3184 0.3901 0.3669
0.603 1.5146 390 0.7363 -1.3393 -1.5502 0.5 0.2109 -133.7717 -167.0953 0.3080 0.2854
0.1335 1.8175 468 0.7920 -1.5465 -1.6888 0.4167 0.1423 -135.1577 -169.1670 0.1816 0.1612
0.1427 2.1204 546 0.7712 -1.7940 -2.0501 0.5 0.2561 -138.7705 -171.6423 0.1192 0.0991
0.2443 2.4233 624 0.8586 -2.4320 -2.8184 0.5 0.3864 -146.4533 -178.0219 -0.0246 -0.0443
0.0228 2.7262 702 0.8499 -2.3527 -2.7258 0.5 0.3731 -145.5276 -177.2292 -0.0232 -0.0429

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.0.2
  • Tokenizers 0.19.1
Downloads last month
64
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V2

Adapter
(1090)
this model