Hyponatremia_L3_1000steps_1e5rate_01beta_DPO

This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: 1.0768
  • Rewards/rejected: -15.3439
  • Rewards/accuracies: 1.0
  • Rewards/margins: 16.4206
  • Logps/rejected: -192.8655
  • Logps/chosen: -11.9493
  • Logits/rejected: -1.0760
  • Logits/chosen: -0.9787

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 0.2667 50 0.0000 0.9553 -10.8575 1.0 11.8128 -148.0022 -13.1641 -1.0364 -0.9611
0.0 0.5333 100 0.0000 0.9519 -12.4385 1.0 13.3903 -163.8113 -13.1981 -1.0526 -0.9728
0.0 0.8 150 0.0000 0.9832 -13.0797 1.0 14.0629 -170.2236 -12.8844 -1.0616 -0.9786
0.0 1.0667 200 0.0000 0.9920 -13.5014 1.0 14.4934 -174.4411 -12.7967 -1.0686 -0.9825
0.0 1.3333 250 0.0000 1.0027 -13.8298 1.0 14.8325 -177.7250 -12.6903 -1.0703 -0.9822
0.0 1.6 300 0.0000 1.0142 -14.0854 1.0 15.0996 -180.2808 -12.5749 -1.0721 -0.9818
0.0 1.8667 350 0.0000 1.0305 -14.3255 1.0 15.3560 -182.6821 -12.4120 -1.0734 -0.9816
0.0 2.1333 400 0.0000 1.0373 -14.5462 1.0 15.5835 -184.8884 -12.3434 -1.0740 -0.9810
0.0 2.4 450 0.0000 1.0509 -14.7386 1.0 15.7895 -186.8133 -12.2083 -1.0751 -0.9810
0.0 2.6667 500 0.0000 1.0573 -14.8986 1.0 15.9560 -188.4131 -12.1435 -1.0767 -0.9816
0.0 2.9333 550 0.0000 1.0640 -15.0362 1.0 16.1002 -189.7889 -12.0765 -1.0754 -0.9801
0.0 3.2 600 0.0000 1.0681 -15.1438 1.0 16.2119 -190.8647 -12.0355 -1.0755 -0.9793
0.0 3.4667 650 0.0000 1.0702 -15.2094 1.0 16.2796 -191.5211 -12.0146 -1.0752 -0.9782
0.0 3.7333 700 0.0000 1.0749 -15.2717 1.0 16.3466 -192.1442 -11.9678 -1.0751 -0.9777
0.0 4.0 750 0.0000 1.0742 -15.3088 1.0 16.3831 -192.5153 -11.9746 -1.0760 -0.9782
0.0 4.2667 800 0.0000 1.0784 -15.3235 1.0 16.4019 -192.6623 -11.9330 -1.0748 -0.9774
0.0 4.5333 850 0.0000 1.0743 -15.3432 1.0 16.4175 -192.8588 -11.9742 -1.0748 -0.9772
0.0 4.8 900 0.0000 1.0767 -15.3361 1.0 16.4128 -192.7881 -11.9501 -1.0756 -0.9780
0.0 5.0667 950 0.0000 1.0768 -15.3439 1.0 16.4206 -192.8655 -11.9493 -1.0760 -0.9787
0.0 5.3333 1000 0.0000 1.0768 -15.3439 1.0 16.4206 -192.8655 -11.9493 -1.0760 -0.9787

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.0.0+cu117
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for tsavage68/Hyponatremia_L3_1000steps_1e5rate_01beta_DPO