Hyponatremia_L3_1000steps_1e8rate_05beta_DPO
This model is a fine-tuned version of tsavage68/Hyponatremia_L3_450steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6800
- Rewards/chosen: 0.0111
- Rewards/rejected: -0.0183
- Rewards/accuracies: 0.6300
- Rewards/margins: 0.0293
- Logps/rejected: -39.4634
- Logps/chosen: -22.6947
- Logits/rejected: -1.0185
- Logits/chosen: -0.9455
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7087 | 0.2667 | 50 | 0.6904 | 0.0099 | 0.0022 | 0.5600 | 0.0077 | -39.4225 | -22.6970 | -1.0181 | -0.9449 |
0.7054 | 0.5333 | 100 | 0.6945 | 0.0150 | 0.0155 | 0.4700 | -0.0005 | -39.3959 | -22.6868 | -1.0188 | -0.9457 |
0.6792 | 0.8 | 150 | 0.6916 | 0.0089 | 0.0036 | 0.5100 | 0.0052 | -39.4196 | -22.6991 | -1.0191 | -0.9458 |
0.6726 | 1.0667 | 200 | 0.6884 | 0.0071 | -0.0042 | 0.5200 | 0.0114 | -39.4353 | -22.7026 | -1.0195 | -0.9464 |
0.6877 | 1.3333 | 250 | 0.6869 | 0.0113 | -0.0039 | 0.5600 | 0.0152 | -39.4347 | -22.6943 | -1.0183 | -0.9452 |
0.6655 | 1.6 | 300 | 0.6882 | 0.0126 | 0.0002 | 0.5700 | 0.0124 | -39.4264 | -22.6915 | -1.0193 | -0.9460 |
0.6734 | 1.8667 | 350 | 0.6903 | 0.0156 | 0.0077 | 0.5400 | 0.0078 | -39.4113 | -22.6856 | -1.0194 | -0.9463 |
0.6759 | 2.1333 | 400 | 0.6839 | 0.0065 | -0.0142 | 0.6000 | 0.0207 | -39.4553 | -22.7038 | -1.0189 | -0.9459 |
0.6775 | 2.4 | 450 | 0.6768 | 0.0146 | -0.0209 | 0.6600 | 0.0355 | -39.4687 | -22.6875 | -1.0184 | -0.9453 |
0.692 | 2.6667 | 500 | 0.6800 | 0.0192 | -0.0094 | 0.6000 | 0.0286 | -39.4456 | -22.6784 | -1.0192 | -0.9462 |
0.6805 | 2.9333 | 550 | 0.6807 | 0.0136 | -0.0142 | 0.5700 | 0.0278 | -39.4552 | -22.6895 | -1.0194 | -0.9463 |
0.6711 | 3.2 | 600 | 0.6819 | 0.0058 | -0.0191 | 0.6300 | 0.0248 | -39.4650 | -22.7053 | -1.0191 | -0.9460 |
0.6642 | 3.4667 | 650 | 0.6796 | 0.0124 | -0.0172 | 0.5800 | 0.0296 | -39.4612 | -22.6920 | -1.0190 | -0.9458 |
0.6798 | 3.7333 | 700 | 0.6861 | 0.0179 | 0.0012 | 0.5500 | 0.0167 | -39.4244 | -22.6810 | -1.0189 | -0.9457 |
0.6845 | 4.0 | 750 | 0.6807 | 0.0102 | -0.0177 | 0.6200 | 0.0278 | -39.4621 | -22.6965 | -1.0185 | -0.9454 |
0.6829 | 4.2667 | 800 | 0.6813 | 0.0097 | -0.0170 | 0.6100 | 0.0267 | -39.4609 | -22.6974 | -1.0185 | -0.9454 |
0.6779 | 4.5333 | 850 | 0.6802 | 0.0106 | -0.0182 | 0.6300 | 0.0288 | -39.4632 | -22.6955 | -1.0185 | -0.9455 |
0.6738 | 4.8 | 900 | 0.6800 | 0.0111 | -0.0183 | 0.6300 | 0.0293 | -39.4634 | -22.6947 | -1.0185 | -0.9455 |
0.6731 | 5.0667 | 950 | 0.6800 | 0.0111 | -0.0183 | 0.6300 | 0.0293 | -39.4634 | -22.6947 | -1.0185 | -0.9455 |
0.6674 | 5.3333 | 1000 | 0.6800 | 0.0111 | -0.0183 | 0.6300 | 0.0293 | -39.4634 | -22.6947 | -1.0185 | -0.9455 |
Framework versions
- Transformers 4.42.3
- Pytorch 2.0.0+cu117
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for tsavage68/Hyponatremia_L3_1000steps_1e8rate_05beta_DPO
Base model
meta-llama/Meta-Llama-3-8B-Instruct