Hyponatremia_L3_1000steps_1e5rate_01beta_DPO
This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: 1.0768
- Rewards/rejected: -15.3439
- Rewards/accuracies: 1.0
- Rewards/margins: 16.4206
- Logps/rejected: -192.8655
- Logps/chosen: -11.9493
- Logits/rejected: -1.0760
- Logits/chosen: -0.9787
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.0 | 0.2667 | 50 | 0.0000 | 0.9553 | -10.8575 | 1.0 | 11.8128 | -148.0022 | -13.1641 | -1.0364 | -0.9611 |
0.0 | 0.5333 | 100 | 0.0000 | 0.9519 | -12.4385 | 1.0 | 13.3903 | -163.8113 | -13.1981 | -1.0526 | -0.9728 |
0.0 | 0.8 | 150 | 0.0000 | 0.9832 | -13.0797 | 1.0 | 14.0629 | -170.2236 | -12.8844 | -1.0616 | -0.9786 |
0.0 | 1.0667 | 200 | 0.0000 | 0.9920 | -13.5014 | 1.0 | 14.4934 | -174.4411 | -12.7967 | -1.0686 | -0.9825 |
0.0 | 1.3333 | 250 | 0.0000 | 1.0027 | -13.8298 | 1.0 | 14.8325 | -177.7250 | -12.6903 | -1.0703 | -0.9822 |
0.0 | 1.6 | 300 | 0.0000 | 1.0142 | -14.0854 | 1.0 | 15.0996 | -180.2808 | -12.5749 | -1.0721 | -0.9818 |
0.0 | 1.8667 | 350 | 0.0000 | 1.0305 | -14.3255 | 1.0 | 15.3560 | -182.6821 | -12.4120 | -1.0734 | -0.9816 |
0.0 | 2.1333 | 400 | 0.0000 | 1.0373 | -14.5462 | 1.0 | 15.5835 | -184.8884 | -12.3434 | -1.0740 | -0.9810 |
0.0 | 2.4 | 450 | 0.0000 | 1.0509 | -14.7386 | 1.0 | 15.7895 | -186.8133 | -12.2083 | -1.0751 | -0.9810 |
0.0 | 2.6667 | 500 | 0.0000 | 1.0573 | -14.8986 | 1.0 | 15.9560 | -188.4131 | -12.1435 | -1.0767 | -0.9816 |
0.0 | 2.9333 | 550 | 0.0000 | 1.0640 | -15.0362 | 1.0 | 16.1002 | -189.7889 | -12.0765 | -1.0754 | -0.9801 |
0.0 | 3.2 | 600 | 0.0000 | 1.0681 | -15.1438 | 1.0 | 16.2119 | -190.8647 | -12.0355 | -1.0755 | -0.9793 |
0.0 | 3.4667 | 650 | 0.0000 | 1.0702 | -15.2094 | 1.0 | 16.2796 | -191.5211 | -12.0146 | -1.0752 | -0.9782 |
0.0 | 3.7333 | 700 | 0.0000 | 1.0749 | -15.2717 | 1.0 | 16.3466 | -192.1442 | -11.9678 | -1.0751 | -0.9777 |
0.0 | 4.0 | 750 | 0.0000 | 1.0742 | -15.3088 | 1.0 | 16.3831 | -192.5153 | -11.9746 | -1.0760 | -0.9782 |
0.0 | 4.2667 | 800 | 0.0000 | 1.0784 | -15.3235 | 1.0 | 16.4019 | -192.6623 | -11.9330 | -1.0748 | -0.9774 |
0.0 | 4.5333 | 850 | 0.0000 | 1.0743 | -15.3432 | 1.0 | 16.4175 | -192.8588 | -11.9742 | -1.0748 | -0.9772 |
0.0 | 4.8 | 900 | 0.0000 | 1.0767 | -15.3361 | 1.0 | 16.4128 | -192.7881 | -11.9501 | -1.0756 | -0.9780 |
0.0 | 5.0667 | 950 | 0.0000 | 1.0768 | -15.3439 | 1.0 | 16.4206 | -192.8655 | -11.9493 | -1.0760 | -0.9787 |
0.0 | 5.3333 | 1000 | 0.0000 | 1.0768 | -15.3439 | 1.0 | 16.4206 | -192.8655 | -11.9493 | -1.0760 | -0.9787 |
Framework versions
- Transformers 4.42.3
- Pytorch 2.0.0+cu117
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/Hyponatremia_L3_1000steps_1e5rate_01beta_DPO
Base model
meta-llama/Meta-Llama-3-8B-Instruct