Edit model card

dpo-llama-chat-without-none

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.9481
  • Rewards/chosen: 4.6795
  • Rewards/rejected: 2.8189
  • Rewards/accuracies: 0.8547
  • Rewards/margins: 1.8606
  • Logps/rejected: -60.8495
  • Logps/chosen: -50.0326
  • Logits/rejected: -0.2216
  • Logits/chosen: -0.2323

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
6.3 0.24 100 6.1290 3.4767 3.2110 0.5920 0.2657 -56.9286 -62.0606 -0.2723 -0.2654
5.5843 0.48 200 5.8936 3.6904 3.2305 0.6520 0.4599 -56.7330 -59.9230 0.2517 0.2475
5.757 0.72 300 5.6694 3.9164 3.1893 0.7253 0.7271 -57.1450 -57.6631 0.3505 0.3418
5.5385 0.96 400 5.4629 4.1466 3.1351 0.7600 1.0115 -57.6871 -55.3611 0.2059 0.1970
5.2301 1.2 500 5.2891 4.3324 3.0305 0.7880 1.3020 -58.7338 -53.5027 0.1063 0.0968
5.0115 1.44 600 5.1601 4.4582 2.9458 0.8213 1.5124 -59.5800 -52.2452 -0.1082 -0.1154
4.9893 1.68 700 5.0431 4.5787 2.9142 0.8413 1.6645 -59.8968 -51.0404 -0.1716 -0.1829
5.0292 1.92 800 4.9770 4.6501 2.8827 0.8427 1.7673 -60.2111 -50.3266 -0.1929 -0.2042
4.331 2.16 900 4.9577 4.6724 2.8191 0.8480 1.8534 -60.8478 -50.1027 -0.2005 -0.2121
4.5481 2.4 1000 4.9481 4.6795 2.8189 0.8547 1.8606 -60.8495 -50.0326 -0.2216 -0.2323

Framework versions

  • PEFT 0.8.2
  • Transformers 4.36.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.2
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Evan-Lin/positive-chosen-llama-chat-without-none

Adapter
(1043)
this model