rasyosef commited on
Commit
020f385
1 Parent(s): ffb2a55

End of training

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: rasyosef/phi-1_5-sft-openhermes-v2
3
+ library_name: peft
4
+ license: mit
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: phi-1_5-openhermesv2-dpo-combinedv3
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # phi-1_5-openhermesv2-dpo-combinedv3
18
+
19
+ This model is a fine-tuned version of [rasyosef/phi-1_5-sft-openhermes-v2](https://huggingface.co/rasyosef/phi-1_5-sft-openhermes-v2) on an unknown dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.5013
22
+ - Rewards/chosen: -1.0250
23
+ - Rewards/rejected: -2.3893
24
+ - Rewards/accuracies: 0.7283
25
+ - Rewards/margins: 1.3643
26
+ - Logps/rejected: -162.0916
27
+ - Logps/chosen: -128.1033
28
+ - Logits/rejected: 5.3082
29
+ - Logits/chosen: 5.1890
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 2e-05
49
+ - train_batch_size: 8
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - gradient_accumulation_steps: 2
53
+ - total_train_batch_size: 16
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: cosine
56
+ - lr_scheduler_warmup_steps: 300
57
+ - num_epochs: 3
58
+
59
+ ### Training results
60
+
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6899 | 0.1241 | 138 | 0.6769 | -0.0153 | -0.0504 | 0.625 | 0.0351 | -138.7025 | -118.0066 | 4.5710 | 4.4532 |
64
+ | 0.6309 | 0.2482 | 276 | 0.6035 | -0.2012 | -0.5586 | 0.7120 | 0.3575 | -143.7850 | -119.8655 | 4.5167 | 4.3940 |
65
+ | 0.5756 | 0.3723 | 414 | 0.5669 | -0.3693 | -0.9842 | 0.7174 | 0.6149 | -148.0405 | -121.5467 | 4.6242 | 4.5060 |
66
+ | 0.5715 | 0.4964 | 552 | 0.5446 | -0.4109 | -1.1855 | 0.7283 | 0.7745 | -150.0534 | -121.9633 | 4.7324 | 4.6143 |
67
+ | 0.5449 | 0.6205 | 690 | 0.5331 | -0.4666 | -1.3090 | 0.7446 | 0.8424 | -151.2884 | -122.5196 | 4.8229 | 4.7080 |
68
+ | 0.5536 | 0.7446 | 828 | 0.5136 | -0.4885 | -1.3825 | 0.7446 | 0.8940 | -152.0234 | -122.7389 | 4.8867 | 4.7737 |
69
+ | 0.5253 | 0.8687 | 966 | 0.5057 | -0.5613 | -1.5446 | 0.7554 | 0.9832 | -153.6442 | -123.4672 | 4.9287 | 4.8080 |
70
+ | 0.5249 | 0.9928 | 1104 | 0.5054 | -0.5101 | -1.4656 | 0.75 | 0.9555 | -152.8544 | -122.9549 | 4.8704 | 4.7521 |
71
+ | 0.4631 | 1.1169 | 1242 | 0.5067 | -0.6889 | -1.7678 | 0.75 | 1.0789 | -155.8768 | -124.7426 | 4.8470 | 4.7276 |
72
+ | 0.4524 | 1.2410 | 1380 | 0.5006 | -0.7467 | -1.9049 | 0.7446 | 1.1582 | -157.2474 | -125.3205 | 4.9447 | 4.8239 |
73
+ | 0.424 | 1.3651 | 1518 | 0.5036 | -0.7638 | -2.0144 | 0.7337 | 1.2505 | -158.3425 | -125.4923 | 4.9235 | 4.8002 |
74
+ | 0.4428 | 1.4892 | 1656 | 0.5004 | -0.7790 | -2.0132 | 0.7446 | 1.2342 | -158.3307 | -125.6437 | 4.9576 | 4.8375 |
75
+ | 0.4424 | 1.6133 | 1794 | 0.4944 | -0.8220 | -2.0517 | 0.7391 | 1.2297 | -158.7152 | -126.0739 | 4.9736 | 4.8553 |
76
+ | 0.4358 | 1.7374 | 1932 | 0.5022 | -0.8091 | -1.9993 | 0.7228 | 1.1902 | -158.1918 | -125.9447 | 5.0894 | 4.9702 |
77
+ | 0.4426 | 1.8615 | 2070 | 0.4992 | -0.8254 | -2.0308 | 0.7228 | 1.2054 | -158.5065 | -126.1077 | 5.0943 | 4.9780 |
78
+ | 0.4226 | 1.9856 | 2208 | 0.4971 | -0.8701 | -2.1434 | 0.7283 | 1.2733 | -159.6329 | -126.5553 | 5.1222 | 5.0011 |
79
+ | 0.3684 | 2.1097 | 2346 | 0.5032 | -0.9201 | -2.2281 | 0.7228 | 1.3081 | -160.4799 | -127.0545 | 5.2209 | 5.1031 |
80
+ | 0.3695 | 2.2338 | 2484 | 0.5022 | -0.9332 | -2.2651 | 0.7228 | 1.3319 | -160.8495 | -127.1860 | 5.2170 | 5.0977 |
81
+ | 0.3693 | 2.3579 | 2622 | 0.5022 | -0.9418 | -2.2839 | 0.7283 | 1.3421 | -161.0379 | -127.2717 | 5.2390 | 5.1169 |
82
+ | 0.3659 | 2.4820 | 2760 | 0.5037 | -0.9820 | -2.3392 | 0.7228 | 1.3572 | -161.5908 | -127.6742 | 5.2392 | 5.1148 |
83
+ | 0.3557 | 2.6061 | 2898 | 0.5031 | -1.0001 | -2.3531 | 0.7228 | 1.3529 | -161.7294 | -127.8552 | 5.2704 | 5.1488 |
84
+ | 0.3491 | 2.7302 | 3036 | 0.5053 | -1.0242 | -2.3803 | 0.7228 | 1.3562 | -162.0017 | -128.0954 | 5.2880 | 5.1693 |
85
+ | 0.3512 | 2.8543 | 3174 | 0.5036 | -1.0265 | -2.3833 | 0.7174 | 1.3568 | -162.0320 | -128.1190 | 5.2965 | 5.1768 |
86
+ | 0.3458 | 2.9784 | 3312 | 0.5013 | -1.0250 | -2.3893 | 0.7283 | 1.3643 | -162.0916 | -128.1033 | 5.3082 | 5.1890 |
87
+
88
+
89
+ ### Framework versions
90
+
91
+ - PEFT 0.11.1
92
+ - Transformers 4.42.4
93
+ - Pytorch 2.3.1+cu121
94
+ - Datasets 2.20.0
95
+ - Tokenizers 0.19.1