Summary_L3_1000steps_1e6rate_05beta_CSFTDPO
This model is a fine-tuned version of tsavage68/Summary_L3_1000steps_1e7rate_SFT2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5961
- Rewards/chosen: 0.1158
- Rewards/rejected: -2.7330
- Rewards/accuracies: 0.1400
- Rewards/margins: 2.8488
- Logps/rejected: -20.7298
- Logps/chosen: -9.1512
- Logits/rejected: -1.1135
- Logits/chosen: -1.1149
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.555 | 0.2004 | 50 | 0.5962 | 0.0976 | -1.3577 | 0.1400 | 1.4553 | -17.9791 | -9.1876 | -1.0985 | -1.1002 |
0.6585 | 0.4008 | 100 | 0.5962 | 0.1094 | -1.5231 | 0.1400 | 1.6326 | -18.3100 | -9.1639 | -1.1003 | -1.1019 |
0.6238 | 0.6012 | 150 | 0.5961 | 0.1341 | -2.2789 | 0.1400 | 2.4130 | -19.8216 | -9.1145 | -1.1048 | -1.1065 |
0.6065 | 0.8016 | 200 | 0.5961 | 0.1193 | -2.7271 | 0.1400 | 2.8464 | -20.7179 | -9.1442 | -1.1137 | -1.1150 |
0.6238 | 1.0020 | 250 | 0.5961 | 0.1211 | -2.7359 | 0.1400 | 2.8570 | -20.7355 | -9.1407 | -1.1133 | -1.1146 |
0.6238 | 1.2024 | 300 | 0.5961 | 0.1211 | -2.7359 | 0.1400 | 2.8570 | -20.7355 | -9.1407 | -1.1133 | -1.1146 |
0.6238 | 1.4028 | 350 | 0.5961 | 0.1226 | -2.7319 | 0.1400 | 2.8545 | -20.7275 | -9.1376 | -1.1131 | -1.1144 |
0.5718 | 1.6032 | 400 | 0.5961 | 0.1226 | -2.7319 | 0.1400 | 2.8545 | -20.7275 | -9.1376 | -1.1131 | -1.1144 |
0.5892 | 1.8036 | 450 | 0.5961 | 0.1196 | -2.7246 | 0.1400 | 2.8442 | -20.7129 | -9.1435 | -1.1135 | -1.1147 |
0.5718 | 2.0040 | 500 | 0.5961 | 0.1211 | -2.7256 | 0.1400 | 2.8467 | -20.7150 | -9.1406 | -1.1135 | -1.1147 |
0.5718 | 2.2044 | 550 | 0.5961 | 0.1207 | -2.7233 | 0.1400 | 2.8439 | -20.7103 | -9.1414 | -1.1134 | -1.1147 |
0.5545 | 2.4048 | 600 | 0.5961 | 0.1207 | -2.7233 | 0.1400 | 2.8439 | -20.7103 | -9.1414 | -1.1134 | -1.1147 |
0.5199 | 2.6052 | 650 | 0.5961 | 0.1207 | -2.7233 | 0.1400 | 2.8439 | -20.7103 | -9.1414 | -1.1134 | -1.1147 |
0.6238 | 2.8056 | 700 | 0.5961 | 0.1207 | -2.7233 | 0.1400 | 2.8439 | -20.7103 | -9.1414 | -1.1134 | -1.1147 |
0.6065 | 3.0060 | 750 | 0.5961 | 0.1181 | -2.7332 | 0.1400 | 2.8513 | -20.7302 | -9.1466 | -1.1134 | -1.1147 |
0.6412 | 3.2064 | 800 | 0.5961 | 0.1124 | -2.7370 | 0.1400 | 2.8494 | -20.7378 | -9.1580 | -1.1135 | -1.1148 |
0.6585 | 3.4068 | 850 | 0.5961 | 0.1124 | -2.7370 | 0.1400 | 2.8494 | -20.7378 | -9.1580 | -1.1135 | -1.1148 |
0.6238 | 3.6072 | 900 | 0.5961 | 0.1148 | -2.7352 | 0.1400 | 2.8500 | -20.7342 | -9.1532 | -1.1135 | -1.1149 |
0.5372 | 3.8076 | 950 | 0.5961 | 0.1148 | -2.7352 | 0.1400 | 2.8500 | -20.7342 | -9.1532 | -1.1135 | -1.1149 |
0.6238 | 4.0080 | 1000 | 0.5961 | 0.1158 | -2.7330 | 0.1400 | 2.8488 | -20.7298 | -9.1512 | -1.1135 | -1.1149 |
Framework versions
- Transformers 4.41.2
- Pytorch 2.0.0+cu117
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 2