martimfasantos commited on
Commit
11b4c66
1 Parent(s): 4aa8f94

Model save

Browse files
README.md CHANGED
@@ -2,15 +2,9 @@
2
  license: apache-2.0
3
  base_model: martimfasantos/tinyllama-1.1b-sum-sft-full
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - dpo
8
  - generated_from_trainer
9
- - trl
10
- - dpo
11
- - generated_from_trainer
12
- datasets:
13
- - openai/summarize_from_feedback
14
  model-index:
15
  - name: tinyllama-1.1b-sum-dpo-full
16
  results: []
@@ -21,17 +15,17 @@ should probably proofread and complete it, then remove this comment. -->
21
 
22
  # tinyllama-1.1b-sum-dpo-full
23
 
24
- This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full) on the openai/summarize_from_feedback dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.6342
27
- - Rewards/chosen: -1.8568
28
- - Rewards/rejected: -2.3204
29
- - Rewards/accuracies: 0.6580
30
- - Rewards/margins: 0.4635
31
- - Logps/rejected: -295.1929
32
- - Logps/chosen: -244.3875
33
- - Logits/rejected: -1.3920
34
- - Logits/chosen: -1.4190
35
 
36
  ## Model description
37
 
@@ -50,7 +44,7 @@ More information needed
50
  ### Training hyperparameters
51
 
52
  The following hyperparameters were used during training:
53
- - learning_rate: 5e-07
54
  - train_batch_size: 8
55
  - eval_batch_size: 8
56
  - seed: 42
@@ -64,129 +58,129 @@ The following hyperparameters were used during training:
64
 
65
  ### Training results
66
 
67
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
68
- |:-------------:|:-----:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
69
- | 0.6929 | 0.02 | 100 | 0.6932 | -0.0000 | 0.0000 | 0.4986 | -0.0000 | -63.1568 | -58.7055 | -3.1598 | -3.1655 |
70
- | 0.693 | 0.03 | 200 | 0.6931 | 0.0002 | 0.0002 | 0.5128 | 0.0000 | -63.1375 | -58.6803 | -3.1596 | -3.1653 |
71
- | 0.6926 | 0.05 | 300 | 0.6930 | 0.0006 | 0.0003 | 0.5395 | 0.0003 | -63.1272 | -58.6442 | -3.1581 | -3.1638 |
72
- | 0.691 | 0.07 | 400 | 0.6926 | 0.0014 | 0.0004 | 0.5611 | 0.0010 | -63.1156 | -58.5606 | -3.1547 | -3.1603 |
73
- | 0.6907 | 0.09 | 500 | 0.6921 | 0.0021 | -0.0000 | 0.5755 | 0.0021 | -63.1621 | -58.4974 | -3.1459 | -3.1515 |
74
- | 0.6852 | 0.1 | 600 | 0.6915 | 0.0010 | -0.0025 | 0.5822 | 0.0035 | -63.4056 | -58.6003 | -3.1331 | -3.1388 |
75
- | 0.6854 | 0.12 | 700 | 0.6905 | -0.0024 | -0.0080 | 0.5895 | 0.0056 | -63.9547 | -58.9453 | -3.1150 | -3.1207 |
76
- | 0.6829 | 0.14 | 800 | 0.6887 | -0.0198 | -0.0294 | 0.5734 | 0.0097 | -66.0990 | -60.6796 | -3.0887 | -3.0944 |
77
- | 0.6773 | 0.16 | 900 | 0.6863 | -0.0499 | -0.0651 | 0.5929 | 0.0152 | -69.6642 | -63.6925 | -3.0513 | -3.0570 |
78
- | 0.6818 | 0.17 | 1000 | 0.6837 | -0.0860 | -0.1071 | 0.5971 | 0.0212 | -73.8714 | -67.3013 | -3.0031 | -3.0087 |
79
- | 0.6715 | 0.19 | 1100 | 0.6800 | -0.1307 | -0.1606 | 0.6057 | 0.0300 | -79.2216 | -71.7704 | -2.9405 | -2.9461 |
80
- | 0.6651 | 0.21 | 1200 | 0.6756 | -0.1933 | -0.2344 | 0.5997 | 0.0411 | -86.5957 | -78.0297 | -2.8460 | -2.8516 |
81
- | 0.663 | 0.22 | 1300 | 0.6691 | -0.2828 | -0.3409 | 0.6171 | 0.0581 | -97.2443 | -86.9854 | -2.7796 | -2.7856 |
82
- | 0.6329 | 0.24 | 1400 | 0.6610 | -0.3769 | -0.4582 | 0.6185 | 0.0813 | -108.9814 | -96.3935 | -2.6744 | -2.6805 |
83
- | 0.6356 | 0.26 | 1500 | 0.6537 | -0.4858 | -0.5921 | 0.6380 | 0.1063 | -122.3668 | -107.2818 | -2.5109 | -2.5177 |
84
- | 0.6275 | 0.28 | 1600 | 0.6452 | -0.5829 | -0.7205 | 0.6364 | 0.1376 | -135.2118 | -116.9967 | -2.4086 | -2.4171 |
85
- | 0.6315 | 0.29 | 1700 | 0.6434 | -0.5896 | -0.7345 | 0.6336 | 0.1449 | -136.6092 | -117.6634 | -2.3275 | -2.3370 |
86
- | 0.6166 | 0.31 | 1800 | 0.6394 | -0.7915 | -0.9646 | 0.6289 | 0.1731 | -159.6184 | -137.8539 | -2.0875 | -2.0994 |
87
- | 0.6238 | 0.33 | 1900 | 0.6394 | -0.9314 | -1.1088 | 0.6280 | 0.1774 | -174.0358 | -151.8405 | -1.9646 | -1.9768 |
88
- | 0.5824 | 0.34 | 2000 | 0.6345 | -0.9755 | -1.1825 | 0.6338 | 0.2070 | -181.4065 | -156.2569 | -1.9742 | -1.9884 |
89
- | 0.5895 | 0.36 | 2100 | 0.6449 | -0.6585 | -0.8078 | 0.6338 | 0.1493 | -143.9416 | -124.5552 | -1.9401 | -1.9533 |
90
- | 0.5633 | 0.38 | 2200 | 0.6434 | -0.6348 | -0.7894 | 0.6248 | 0.1546 | -142.1007 | -122.1877 | -1.9416 | -1.9549 |
91
- | 0.5459 | 0.4 | 2300 | 0.6320 | -1.0811 | -1.3378 | 0.6301 | 0.2566 | -196.9343 | -166.8161 | -1.6992 | -1.7183 |
92
- | 0.5786 | 0.41 | 2400 | 0.6306 | -1.1984 | -1.4632 | 0.6292 | 0.2649 | -209.4779 | -178.5388 | -1.6168 | -1.6363 |
93
- | 0.5679 | 0.43 | 2500 | 0.6330 | -0.9020 | -1.1230 | 0.6345 | 0.2210 | -175.4528 | -148.9024 | -1.7044 | -1.7220 |
94
- | 0.5426 | 0.45 | 2600 | 0.6352 | -0.8874 | -1.0910 | 0.6355 | 0.2037 | -172.2623 | -147.4389 | -1.7825 | -1.7993 |
95
- | 0.5888 | 0.47 | 2700 | 0.6303 | -0.9094 | -1.1295 | 0.6452 | 0.2201 | -176.1057 | -149.6399 | -1.8294 | -1.8467 |
96
- | 0.6328 | 0.48 | 2800 | 0.6316 | -0.8366 | -1.0424 | 0.6420 | 0.2058 | -167.4005 | -142.3680 | -1.8252 | -1.8423 |
97
- | 0.5746 | 0.5 | 2900 | 0.6267 | -1.0547 | -1.3045 | 0.6443 | 0.2499 | -193.6111 | -164.1712 | -1.7076 | -1.7276 |
98
- | 0.5452 | 0.52 | 3000 | 0.6288 | -0.9206 | -1.1541 | 0.6464 | 0.2335 | -178.5692 | -150.7609 | -1.7363 | -1.7558 |
99
- | 0.5525 | 0.53 | 3100 | 0.6231 | -1.0317 | -1.3080 | 0.6564 | 0.2763 | -193.9615 | -161.8740 | -1.6101 | -1.6323 |
100
- | 0.6097 | 0.55 | 3200 | 0.6201 | -1.0912 | -1.3708 | 0.6554 | 0.2796 | -200.2384 | -167.8213 | -1.5903 | -1.6121 |
101
- | 0.5807 | 0.57 | 3300 | 0.6239 | -1.1017 | -1.3657 | 0.6506 | 0.2640 | -199.7250 | -168.8761 | -1.5292 | -1.5503 |
102
- | 0.536 | 0.59 | 3400 | 0.6312 | -0.8275 | -1.0409 | 0.6466 | 0.2134 | -167.2509 | -141.4572 | -1.7056 | -1.7240 |
103
- | 0.5392 | 0.6 | 3500 | 0.6287 | -1.0262 | -1.2804 | 0.6466 | 0.2542 | -191.1944 | -161.3248 | -1.6386 | -1.6596 |
104
- | 0.5689 | 0.62 | 3600 | 0.6275 | -1.1210 | -1.3785 | 0.6487 | 0.2574 | -201.0063 | -170.8087 | -1.6286 | -1.6494 |
105
- | 0.517 | 0.64 | 3700 | 0.6244 | -1.2262 | -1.5240 | 0.6566 | 0.2979 | -215.5612 | -181.3195 | -1.4999 | -1.5238 |
106
- | 0.5368 | 0.65 | 3800 | 0.6207 | -1.2368 | -1.5309 | 0.6580 | 0.2941 | -216.2485 | -182.3809 | -1.5010 | -1.5237 |
107
- | 0.5382 | 0.67 | 3900 | 0.6221 | -1.0150 | -1.2770 | 0.6596 | 0.2620 | -190.8593 | -160.2047 | -1.6362 | -1.6580 |
108
- | 0.5399 | 0.69 | 4000 | 0.6212 | -1.1703 | -1.4644 | 0.6599 | 0.2941 | -209.6013 | -175.7381 | -1.4870 | -1.5105 |
109
- | 0.5175 | 0.71 | 4100 | 0.6203 | -1.2765 | -1.5905 | 0.6554 | 0.3140 | -222.2049 | -186.3498 | -1.4476 | -1.4722 |
110
- | 0.5803 | 0.72 | 4200 | 0.6208 | -1.3529 | -1.6862 | 0.6624 | 0.3332 | -231.7760 | -193.9977 | -1.4322 | -1.4581 |
111
- | 0.507 | 0.74 | 4300 | 0.6265 | -0.9361 | -1.1863 | 0.6624 | 0.2501 | -181.7826 | -152.3180 | -1.5738 | -1.5955 |
112
- | 0.5273 | 0.76 | 4400 | 0.6211 | -1.2719 | -1.6087 | 0.6687 | 0.3368 | -224.0267 | -185.8899 | -1.4048 | -1.4308 |
113
- | 0.5574 | 0.78 | 4500 | 0.6233 | -1.1065 | -1.4002 | 0.6671 | 0.2937 | -203.1787 | -169.3536 | -1.4729 | -1.4964 |
114
- | 0.4819 | 0.79 | 4600 | 0.6219 | -1.1036 | -1.4017 | 0.6643 | 0.2981 | -203.3253 | -169.0589 | -1.5017 | -1.5251 |
115
- | 0.5187 | 0.81 | 4700 | 0.6172 | -1.4659 | -1.8338 | 0.6654 | 0.3680 | -246.5411 | -205.2918 | -1.3670 | -1.3935 |
116
- | 0.5805 | 0.83 | 4800 | 0.6146 | -1.4235 | -1.7810 | 0.6619 | 0.3575 | -241.2558 | -201.0503 | -1.4196 | -1.4453 |
117
- | 0.537 | 0.84 | 4900 | 0.6194 | -1.2089 | -1.5178 | 0.6557 | 0.3089 | -214.9402 | -179.5929 | -1.5222 | -1.5460 |
118
- | 0.5112 | 0.86 | 5000 | 0.6177 | -1.5091 | -1.8730 | 0.6580 | 0.3638 | -250.4540 | -209.6180 | -1.4013 | -1.4276 |
119
- | 0.5746 | 0.88 | 5100 | 0.6200 | -1.2224 | -1.5393 | 0.6654 | 0.3168 | -217.0836 | -180.9476 | -1.5328 | -1.5572 |
120
- | 0.5138 | 0.9 | 5200 | 0.6237 | -1.0419 | -1.3187 | 0.6605 | 0.2768 | -195.0258 | -162.8902 | -1.6006 | -1.6232 |
121
- | 0.5094 | 0.91 | 5300 | 0.6181 | -1.2868 | -1.6160 | 0.6599 | 0.3293 | -224.7612 | -187.3815 | -1.5180 | -1.5428 |
122
- | 0.4865 | 0.93 | 5400 | 0.6222 | -1.2264 | -1.5437 | 0.6698 | 0.3173 | -217.5302 | -181.3466 | -1.5197 | -1.5443 |
123
- | 0.513 | 0.95 | 5500 | 0.6214 | -1.1371 | -1.4265 | 0.6722 | 0.2894 | -205.8068 | -172.4182 | -1.5651 | -1.5876 |
124
- | 0.5474 | 0.96 | 5600 | 0.6201 | -1.1854 | -1.4951 | 0.6689 | 0.3097 | -212.6680 | -177.2486 | -1.5109 | -1.5347 |
125
- | 0.5291 | 0.98 | 5700 | 0.6191 | -1.1659 | -1.4788 | 0.6696 | 0.3130 | -211.0420 | -175.2930 | -1.5209 | -1.5449 |
126
- | 0.496 | 1.0 | 5800 | 0.6148 | -1.5172 | -1.9032 | 0.6680 | 0.3860 | -253.4752 | -210.4265 | -1.4163 | -1.4435 |
127
- | 0.3739 | 1.02 | 5900 | 0.6216 | -1.5454 | -1.9612 | 0.6626 | 0.4157 | -259.2733 | -213.2480 | -1.3429 | -1.3716 |
128
- | 0.3835 | 1.03 | 6000 | 0.6214 | -1.8273 | -2.3125 | 0.6671 | 0.4851 | -294.4050 | -241.4372 | -1.2869 | -1.3177 |
129
- | 0.3822 | 1.05 | 6100 | 0.6230 | -2.0009 | -2.5009 | 0.6710 | 0.4999 | -313.2448 | -258.7976 | -1.2163 | -1.2471 |
130
- | 0.4249 | 1.07 | 6200 | 0.6216 | -1.5166 | -1.9264 | 0.6657 | 0.4098 | -255.7980 | -210.3597 | -1.4188 | -1.4463 |
131
- | 0.4731 | 1.09 | 6300 | 0.6206 | -1.7045 | -2.1531 | 0.6654 | 0.4486 | -278.4628 | -229.1491 | -1.3768 | -1.4055 |
132
- | 0.4089 | 1.1 | 6400 | 0.6263 | -1.9433 | -2.4330 | 0.6643 | 0.4897 | -306.4561 | -253.0356 | -1.2985 | -1.3283 |
133
- | 0.4055 | 1.12 | 6500 | 0.6263 | -1.6156 | -2.0285 | 0.6657 | 0.4128 | -266.0024 | -220.2685 | -1.4228 | -1.4496 |
134
- | 0.4373 | 1.14 | 6600 | 0.6319 | -1.9163 | -2.3889 | 0.6615 | 0.4726 | -302.0515 | -250.3334 | -1.3870 | -1.4154 |
135
- | 0.4568 | 1.15 | 6700 | 0.6347 | -1.7086 | -2.1521 | 0.6575 | 0.4435 | -278.3696 | -229.5625 | -1.4138 | -1.4419 |
136
- | 0.396 | 1.17 | 6800 | 0.6304 | -1.8382 | -2.2997 | 0.6694 | 0.4614 | -293.1244 | -242.5259 | -1.3792 | -1.4074 |
137
- | 0.4312 | 1.19 | 6900 | 0.6330 | -2.0759 | -2.5709 | 0.6645 | 0.4950 | -320.2516 | -266.2965 | -1.3565 | -1.3853 |
138
- | 0.4144 | 1.21 | 7000 | 0.6300 | -1.5474 | -1.9476 | 0.6587 | 0.4001 | -257.9128 | -213.4480 | -1.5128 | -1.5385 |
139
- | 0.4501 | 1.22 | 7100 | 0.6320 | -1.5691 | -1.9654 | 0.6510 | 0.3963 | -259.6932 | -215.6143 | -1.4579 | -1.4834 |
140
- | 0.4303 | 1.24 | 7200 | 0.6323 | -1.7741 | -2.2060 | 0.6538 | 0.4319 | -283.7571 | -236.1103 | -1.4104 | -1.4369 |
141
- | 0.4717 | 1.26 | 7300 | 0.6294 | -1.8573 | -2.3122 | 0.6668 | 0.4549 | -294.3745 | -244.4295 | -1.3985 | -1.4254 |
142
- | 0.3908 | 1.27 | 7400 | 0.6307 | -1.6832 | -2.1090 | 0.6568 | 0.4258 | -274.0572 | -227.0262 | -1.4235 | -1.4501 |
143
- | 0.4618 | 1.29 | 7500 | 0.6276 | -1.5299 | -1.9160 | 0.6531 | 0.3861 | -254.7590 | -211.6911 | -1.4812 | -1.5060 |
144
- | 0.5019 | 1.31 | 7600 | 0.6301 | -1.8422 | -2.2951 | 0.6624 | 0.4529 | -292.6649 | -242.9215 | -1.4008 | -1.4277 |
145
- | 0.4239 | 1.33 | 7700 | 0.6266 | -1.6098 | -2.0240 | 0.6633 | 0.4142 | -265.5571 | -219.6812 | -1.4540 | -1.4801 |
146
- | 0.4156 | 1.34 | 7800 | 0.6327 | -1.9969 | -2.4832 | 0.6638 | 0.4864 | -311.4807 | -258.3907 | -1.3619 | -1.3900 |
147
- | 0.418 | 1.36 | 7900 | 0.6321 | -1.7670 | -2.2060 | 0.6578 | 0.4391 | -283.7597 | -235.3999 | -1.4207 | -1.4475 |
148
- | 0.4084 | 1.38 | 8000 | 0.6318 | -1.8853 | -2.3451 | 0.6638 | 0.4598 | -297.6674 | -247.2307 | -1.3816 | -1.4088 |
149
- | 0.4616 | 1.4 | 8100 | 0.6337 | -1.6779 | -2.0977 | 0.6564 | 0.4198 | -272.9300 | -226.4922 | -1.4319 | -1.4581 |
150
- | 0.4033 | 1.41 | 8200 | 0.6331 | -1.8711 | -2.3312 | 0.6638 | 0.4601 | -296.2737 | -245.8150 | -1.3845 | -1.4116 |
151
- | 0.4659 | 1.43 | 8300 | 0.6338 | -1.9457 | -2.4103 | 0.6643 | 0.4646 | -304.1916 | -253.2738 | -1.3745 | -1.4014 |
152
- | 0.4254 | 1.45 | 8400 | 0.6342 | -1.7488 | -2.1805 | 0.6589 | 0.4317 | -281.2074 | -233.5818 | -1.4272 | -1.4531 |
153
- | 0.4177 | 1.46 | 8500 | 0.6338 | -1.7052 | -2.1243 | 0.6589 | 0.4190 | -275.5844 | -229.2278 | -1.4477 | -1.4731 |
154
- | 0.4537 | 1.48 | 8600 | 0.6325 | -1.8512 | -2.2974 | 0.6678 | 0.4461 | -292.8940 | -243.8274 | -1.4197 | -1.4457 |
155
- | 0.4176 | 1.5 | 8700 | 0.6308 | -1.7305 | -2.1647 | 0.6654 | 0.4342 | -279.6241 | -231.7505 | -1.4491 | -1.4751 |
156
- | 0.4486 | 1.52 | 8800 | 0.6291 | -1.7428 | -2.1782 | 0.6694 | 0.4354 | -280.9822 | -232.9864 | -1.4555 | -1.4813 |
157
- | 0.3594 | 1.53 | 8900 | 0.6299 | -1.9280 | -2.3996 | 0.6675 | 0.4716 | -303.1151 | -251.5025 | -1.4002 | -1.4271 |
158
- | 0.4428 | 1.55 | 9000 | 0.6319 | -1.8919 | -2.3581 | 0.6643 | 0.4663 | -298.9696 | -247.8895 | -1.4093 | -1.4361 |
159
- | 0.4441 | 1.57 | 9100 | 0.6315 | -1.7822 | -2.2239 | 0.6671 | 0.4418 | -285.5493 | -236.9199 | -1.4335 | -1.4596 |
160
- | 0.3898 | 1.59 | 9200 | 0.6316 | -1.7689 | -2.2103 | 0.6657 | 0.4414 | -284.1919 | -235.5972 | -1.4175 | -1.4437 |
161
- | 0.3657 | 1.6 | 9300 | 0.6326 | -1.8070 | -2.2549 | 0.6638 | 0.4480 | -288.6493 | -239.3994 | -1.4099 | -1.4361 |
162
- | 0.4666 | 1.62 | 9400 | 0.6325 | -1.7984 | -2.2467 | 0.6631 | 0.4483 | -287.8304 | -238.5475 | -1.4113 | -1.4377 |
163
- | 0.3503 | 1.64 | 9500 | 0.6340 | -1.9330 | -2.4089 | 0.6587 | 0.4759 | -304.0439 | -252.0053 | -1.3757 | -1.4028 |
164
- | 0.3729 | 1.65 | 9600 | 0.6357 | -1.9359 | -2.4150 | 0.6564 | 0.4791 | -304.6583 | -252.2943 | -1.3641 | -1.3914 |
165
- | 0.4403 | 1.67 | 9700 | 0.6342 | -1.8602 | -2.3254 | 0.6624 | 0.4652 | -295.6944 | -244.7219 | -1.3903 | -1.4172 |
166
- | 0.3633 | 1.69 | 9800 | 0.6346 | -1.8563 | -2.3208 | 0.6589 | 0.4644 | -295.2367 | -244.3386 | -1.3928 | -1.4199 |
167
- | 0.3727 | 1.71 | 9900 | 0.6336 | -1.8765 | -2.3444 | 0.6557 | 0.4679 | -297.6013 | -246.3585 | -1.3978 | -1.4249 |
168
- | 0.424 | 1.72 | 10000 | 0.6344 | -1.8698 | -2.3349 | 0.6515 | 0.4650 | -296.6436 | -245.6855 | -1.3958 | -1.4226 |
169
- | 0.3867 | 1.74 | 10100 | 0.6348 | -1.8396 | -2.2973 | 0.6610 | 0.4578 | -292.8903 | -242.6608 | -1.4014 | -1.4282 |
170
- | 0.3851 | 1.76 | 10200 | 0.6358 | -1.9589 | -2.4446 | 0.6608 | 0.4858 | -307.6222 | -254.5927 | -1.3697 | -1.3974 |
171
- | 0.4322 | 1.77 | 10300 | 0.6352 | -1.9333 | -2.4122 | 0.6585 | 0.4788 | -304.3728 | -252.0376 | -1.3729 | -1.4002 |
172
- | 0.3405 | 1.79 | 10400 | 0.6352 | -1.8857 | -2.3538 | 0.6608 | 0.4681 | -298.5337 | -247.2695 | -1.3844 | -1.4115 |
173
- | 0.424 | 1.81 | 10500 | 0.6351 | -1.8775 | -2.3439 | 0.6599 | 0.4665 | -297.5495 | -246.4502 | -1.3843 | -1.4113 |
174
- | 0.4396 | 1.83 | 10600 | 0.6350 | -1.8749 | -2.3405 | 0.6568 | 0.4655 | -297.2035 | -246.1965 | -1.3876 | -1.4146 |
175
- | 0.3908 | 1.84 | 10700 | 0.6334 | -1.8434 | -2.3045 | 0.6564 | 0.4611 | -293.6068 | -243.0424 | -1.3944 | -1.4212 |
176
- | 0.4273 | 1.86 | 10800 | 0.6342 | -1.8539 | -2.3151 | 0.6624 | 0.4611 | -294.6657 | -244.0978 | -1.3926 | -1.4194 |
177
- | 0.3762 | 1.88 | 10900 | 0.6346 | -1.8597 | -2.3213 | 0.6566 | 0.4616 | -295.2873 | -244.6704 | -1.3904 | -1.4173 |
178
- | 0.4734 | 1.9 | 11000 | 0.6339 | -1.8518 | -2.3137 | 0.6629 | 0.4619 | -294.5248 | -243.8795 | -1.3920 | -1.4190 |
179
- | 0.4333 | 1.91 | 11100 | 0.6333 | -1.8546 | -2.3184 | 0.6599 | 0.4638 | -294.9983 | -244.1649 | -1.3921 | -1.4190 |
180
- | 0.4305 | 1.93 | 11200 | 0.6335 | -1.8468 | -2.3074 | 0.6564 | 0.4606 | -293.8987 | -243.3866 | -1.3953 | -1.4221 |
181
- | 0.4817 | 1.95 | 11300 | 0.6343 | -1.8562 | -2.3189 | 0.6573 | 0.4627 | -295.0477 | -244.3265 | -1.3934 | -1.4203 |
182
- | 0.4146 | 1.96 | 11400 | 0.6339 | -1.8573 | -2.3207 | 0.6559 | 0.4634 | -295.2255 | -244.4316 | -1.3909 | -1.4179 |
183
- | 0.432 | 1.98 | 11500 | 0.6337 | -1.8547 | -2.3184 | 0.6536 | 0.4637 | -295.0010 | -244.1783 | -1.3914 | -1.4183 |
184
- | 0.429 | 2.0 | 11600 | 0.6342 | -1.8568 | -2.3204 | 0.6580 | 0.4635 | -295.1929 | -244.3875 | -1.3920 | -1.4190 |
185
 
186
 
187
  ### Framework versions
188
 
189
- - Transformers 4.39.3
190
  - Pytorch 2.1.2
191
- - Datasets 2.18.0
192
- - Tokenizers 0.15.2
 
2
  license: apache-2.0
3
  base_model: martimfasantos/tinyllama-1.1b-sum-sft-full
4
  tags:
 
5
  - trl
6
  - dpo
7
  - generated_from_trainer
 
 
 
 
 
8
  model-index:
9
  - name: tinyllama-1.1b-sum-dpo-full
10
  results: []
 
15
 
16
  # tinyllama-1.1b-sum-dpo-full
17
 
18
+ This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.6549
21
+ - Rewards/chosen: -0.4976
22
+ - Rewards/rejected: -0.6010
23
+ - Rewards/accuracies: 0.6194
24
+ - Rewards/margins: 0.1035
25
+ - Logps/rejected: -123.2810
26
+ - Logps/chosen: -108.4673
27
+ - Logits/rejected: -2.5516
28
+ - Logits/chosen: -2.5584
29
 
30
  ## Model description
31
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
47
+ - learning_rate: 1e-07
48
  - train_batch_size: 8
49
  - eval_batch_size: 8
50
  - seed: 42
 
58
 
59
  ### Training results
60
 
61
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
62
+ |:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
63
+ | 0.6932 | 0.0172 | 100 | 0.6932 | 0.0000 | 0.0001 | 0.4819 | -0.0001 | -63.1720 | -58.7099 | -3.1572 | -3.1629 |
64
+ | 0.6931 | 0.0345 | 200 | 0.6932 | 0.0000 | 0.0001 | 0.4893 | -0.0001 | -63.1716 | -58.7118 | -3.1576 | -3.1632 |
65
+ | 0.6932 | 0.0517 | 300 | 0.6932 | 0.0000 | 0.0001 | 0.4696 | -0.0001 | -63.1677 | -58.7096 | -3.1575 | -3.1631 |
66
+ | 0.6933 | 0.0689 | 400 | 0.6932 | 0.0002 | 0.0002 | 0.4844 | -0.0000 | -63.1572 | -58.6929 | -3.1574 | -3.1631 |
67
+ | 0.6931 | 0.0861 | 500 | 0.6931 | 0.0002 | 0.0002 | 0.5016 | 0.0000 | -63.1582 | -58.6892 | -3.1571 | -3.1628 |
68
+ | 0.6925 | 0.1034 | 600 | 0.6931 | 0.0004 | 0.0003 | 0.5158 | 0.0002 | -63.1507 | -58.6671 | -3.1566 | -3.1623 |
69
+ | 0.6927 | 0.1206 | 700 | 0.6931 | 0.0006 | 0.0004 | 0.5276 | 0.0002 | -63.1420 | -58.6550 | -3.1556 | -3.1612 |
70
+ | 0.6924 | 0.1378 | 800 | 0.6929 | 0.0010 | 0.0006 | 0.5509 | 0.0005 | -63.1244 | -58.6089 | -3.1546 | -3.1601 |
71
+ | 0.692 | 0.1551 | 900 | 0.6928 | 0.0014 | 0.0007 | 0.5534 | 0.0007 | -63.1085 | -58.5690 | -3.1524 | -3.1580 |
72
+ | 0.6924 | 0.1723 | 1000 | 0.6926 | 0.0018 | 0.0007 | 0.5660 | 0.0011 | -63.1097 | -58.5334 | -3.1494 | -3.1550 |
73
+ | 0.6913 | 0.1895 | 1100 | 0.6924 | 0.0021 | 0.0005 | 0.5737 | 0.0016 | -63.1303 | -58.5028 | -3.1458 | -3.1514 |
74
+ | 0.6912 | 0.2068 | 1200 | 0.6921 | 0.0022 | 0.0001 | 0.5795 | 0.0021 | -63.1677 | -58.4881 | -3.1407 | -3.1464 |
75
+ | 0.6911 | 0.2240 | 1300 | 0.6918 | 0.0017 | -0.0011 | 0.5901 | 0.0028 | -63.2892 | -58.5372 | -3.1358 | -3.1414 |
76
+ | 0.6871 | 0.2412 | 1400 | 0.6914 | 0.0006 | -0.0031 | 0.5785 | 0.0037 | -63.4895 | -58.6491 | -3.1300 | -3.1356 |
77
+ | 0.6866 | 0.2584 | 1500 | 0.6910 | -0.0015 | -0.0061 | 0.5750 | 0.0045 | -63.7853 | -58.8661 | -3.1246 | -3.1303 |
78
+ | 0.6876 | 0.2757 | 1600 | 0.6907 | -0.0038 | -0.0091 | 0.5874 | 0.0053 | -64.0863 | -59.0928 | -3.1185 | -3.1241 |
79
+ | 0.6882 | 0.2929 | 1700 | 0.6903 | -0.0067 | -0.0126 | 0.5850 | 0.0060 | -64.4449 | -59.3800 | -3.1117 | -3.1173 |
80
+ | 0.6838 | 0.3101 | 1800 | 0.6900 | -0.0121 | -0.0190 | 0.5825 | 0.0069 | -65.0772 | -59.9201 | -3.1038 | -3.1095 |
81
+ | 0.6836 | 0.3274 | 1900 | 0.6895 | -0.0157 | -0.0235 | 0.5883 | 0.0078 | -65.5277 | -60.2801 | -3.0980 | -3.1037 |
82
+ | 0.685 | 0.3446 | 2000 | 0.6889 | -0.0227 | -0.0319 | 0.5897 | 0.0092 | -66.3702 | -60.9847 | -3.0905 | -3.0962 |
83
+ | 0.6828 | 0.3618 | 2100 | 0.6883 | -0.0311 | -0.0418 | 0.5806 | 0.0107 | -67.3595 | -61.8209 | -3.0840 | -3.0897 |
84
+ | 0.6745 | 0.3790 | 2200 | 0.6876 | -0.0382 | -0.0504 | 0.5883 | 0.0123 | -68.2227 | -62.5273 | -3.0753 | -3.0811 |
85
+ | 0.6781 | 0.3963 | 2300 | 0.6872 | -0.0405 | -0.0537 | 0.5908 | 0.0131 | -68.5468 | -62.7638 | -3.0689 | -3.0745 |
86
+ | 0.6809 | 0.4135 | 2400 | 0.6866 | -0.0471 | -0.0615 | 0.5906 | 0.0144 | -69.3305 | -63.4208 | -3.0592 | -3.0649 |
87
+ | 0.6828 | 0.4307 | 2500 | 0.6862 | -0.0557 | -0.0713 | 0.5913 | 0.0156 | -70.3087 | -64.2813 | -3.0501 | -3.0558 |
88
+ | 0.6754 | 0.4480 | 2600 | 0.6856 | -0.0615 | -0.0783 | 0.5918 | 0.0168 | -71.0083 | -64.8584 | -3.0433 | -3.0490 |
89
+ | 0.6768 | 0.4652 | 2700 | 0.6851 | -0.0674 | -0.0853 | 0.5957 | 0.0180 | -71.7136 | -65.4475 | -3.0370 | -3.0427 |
90
+ | 0.6766 | 0.4824 | 2800 | 0.6846 | -0.0727 | -0.0919 | 0.5967 | 0.0192 | -72.3669 | -65.9771 | -3.0308 | -3.0365 |
91
+ | 0.6769 | 0.4997 | 2900 | 0.6843 | -0.0755 | -0.0954 | 0.6004 | 0.0199 | -72.7197 | -66.2619 | -3.0232 | -3.0289 |
92
+ | 0.6781 | 0.5169 | 3000 | 0.6839 | -0.0812 | -0.1022 | 0.6027 | 0.0210 | -73.3995 | -66.8329 | -3.0144 | -3.0201 |
93
+ | 0.67 | 0.5341 | 3100 | 0.6835 | -0.0822 | -0.1040 | 0.6004 | 0.0218 | -73.5753 | -66.9287 | -3.0095 | -3.0153 |
94
+ | 0.6718 | 0.5513 | 3200 | 0.6828 | -0.0939 | -0.1173 | 0.6015 | 0.0235 | -74.9148 | -68.1005 | -2.9982 | -3.0040 |
95
+ | 0.6724 | 0.5686 | 3300 | 0.6822 | -0.0999 | -0.1249 | 0.6050 | 0.0250 | -75.6694 | -68.7027 | -2.9851 | -2.9908 |
96
+ | 0.6625 | 0.5858 | 3400 | 0.6818 | -0.1009 | -0.1266 | 0.6090 | 0.0257 | -75.8440 | -68.8060 | -2.9762 | -2.9820 |
97
+ | 0.6742 | 0.6030 | 3500 | 0.6814 | -0.1071 | -0.1338 | 0.6083 | 0.0267 | -76.5617 | -69.4202 | -2.9687 | -2.9745 |
98
+ | 0.6722 | 0.6203 | 3600 | 0.6810 | -0.1126 | -0.1404 | 0.6099 | 0.0277 | -77.2155 | -69.9734 | -2.9597 | -2.9655 |
99
+ | 0.664 | 0.6375 | 3700 | 0.6803 | -0.1209 | -0.1502 | 0.6090 | 0.0293 | -78.2040 | -70.8018 | -2.9485 | -2.9543 |
100
+ | 0.6644 | 0.6547 | 3800 | 0.6795 | -0.1327 | -0.1641 | 0.6111 | 0.0314 | -79.5918 | -71.9851 | -2.9386 | -2.9444 |
101
+ | 0.6664 | 0.6720 | 3900 | 0.6786 | -0.1449 | -0.1784 | 0.6080 | 0.0335 | -81.0222 | -73.2044 | -2.9300 | -2.9358 |
102
+ | 0.6653 | 0.6892 | 4000 | 0.6781 | -0.1559 | -0.1909 | 0.6057 | 0.0350 | -82.2692 | -74.3040 | -2.9178 | -2.9236 |
103
+ | 0.6532 | 0.7064 | 4100 | 0.6776 | -0.1612 | -0.1975 | 0.6125 | 0.0363 | -82.9296 | -74.8363 | -2.9005 | -2.9064 |
104
+ | 0.6733 | 0.7236 | 4200 | 0.6769 | -0.1720 | -0.2098 | 0.6087 | 0.0378 | -84.1639 | -75.9119 | -2.8890 | -2.8949 |
105
+ | 0.6618 | 0.7409 | 4300 | 0.6764 | -0.1798 | -0.2189 | 0.6057 | 0.0391 | -85.0723 | -76.6940 | -2.8794 | -2.8853 |
106
+ | 0.6625 | 0.7581 | 4400 | 0.6757 | -0.1936 | -0.2347 | 0.6053 | 0.0411 | -86.6464 | -78.0713 | -2.8686 | -2.8745 |
107
+ | 0.6605 | 0.7753 | 4500 | 0.6746 | -0.2097 | -0.2535 | 0.6066 | 0.0439 | -88.5342 | -79.6776 | -2.8590 | -2.8649 |
108
+ | 0.6437 | 0.7926 | 4600 | 0.6737 | -0.2242 | -0.2703 | 0.6071 | 0.0461 | -90.2150 | -81.1344 | -2.8513 | -2.8573 |
109
+ | 0.6526 | 0.8098 | 4700 | 0.6727 | -0.2385 | -0.2872 | 0.6069 | 0.0487 | -91.9046 | -82.5646 | -2.8429 | -2.8489 |
110
+ | 0.6604 | 0.8270 | 4800 | 0.6721 | -0.2495 | -0.2999 | 0.6090 | 0.0504 | -93.1696 | -83.6594 | -2.8351 | -2.8410 |
111
+ | 0.6664 | 0.8442 | 4900 | 0.6712 | -0.2621 | -0.3148 | 0.6048 | 0.0526 | -94.6595 | -84.9266 | -2.8264 | -2.8324 |
112
+ | 0.6499 | 0.8615 | 5000 | 0.6707 | -0.2706 | -0.3247 | 0.5955 | 0.0541 | -95.6483 | -85.7703 | -2.8111 | -2.8172 |
113
+ | 0.6628 | 0.8787 | 5100 | 0.6697 | -0.2843 | -0.3411 | 0.5969 | 0.0568 | -97.2923 | -87.1431 | -2.8035 | -2.8094 |
114
+ | 0.6513 | 0.8959 | 5200 | 0.6693 | -0.2867 | -0.3444 | 0.5953 | 0.0577 | -97.6222 | -87.3824 | -2.7972 | -2.8031 |
115
+ | 0.6475 | 0.9132 | 5300 | 0.6692 | -0.2901 | -0.3484 | 0.5987 | 0.0583 | -98.0213 | -87.7248 | -2.7882 | -2.7943 |
116
+ | 0.6494 | 0.9304 | 5400 | 0.6687 | -0.2940 | -0.3536 | 0.6015 | 0.0596 | -98.5368 | -88.1090 | -2.7827 | -2.7887 |
117
+ | 0.6412 | 0.9476 | 5500 | 0.6682 | -0.3024 | -0.3635 | 0.5997 | 0.0610 | -99.5251 | -88.9533 | -2.7734 | -2.7794 |
118
+ | 0.6531 | 0.9649 | 5600 | 0.6680 | -0.2995 | -0.3610 | 0.6046 | 0.0615 | -99.2758 | -88.6585 | -2.7683 | -2.7743 |
119
+ | 0.652 | 0.9821 | 5700 | 0.6671 | -0.3121 | -0.3760 | 0.6041 | 0.0639 | -100.7801 | -89.9234 | -2.7604 | -2.7664 |
120
+ | 0.6355 | 0.9993 | 5800 | 0.6663 | -0.3272 | -0.3936 | 0.6057 | 0.0664 | -102.5409 | -91.4366 | -2.7489 | -2.7549 |
121
+ | 0.6362 | 1.0165 | 5900 | 0.6654 | -0.3504 | -0.4199 | 0.6043 | 0.0695 | -105.1658 | -93.7475 | -2.7329 | -2.7390 |
122
+ | 0.6587 | 1.0338 | 6000 | 0.6654 | -0.3453 | -0.4145 | 0.6076 | 0.0692 | -104.6326 | -93.2431 | -2.7260 | -2.7321 |
123
+ | 0.6337 | 1.0510 | 6100 | 0.6649 | -0.3492 | -0.4197 | 0.6078 | 0.0705 | -105.1470 | -93.6331 | -2.7177 | -2.7237 |
124
+ | 0.6372 | 1.0682 | 6200 | 0.6640 | -0.3675 | -0.4408 | 0.6090 | 0.0734 | -107.2651 | -95.4612 | -2.7083 | -2.7144 |
125
+ | 0.6555 | 1.0855 | 6300 | 0.6633 | -0.3808 | -0.4563 | 0.6111 | 0.0755 | -108.8140 | -96.7948 | -2.7009 | -2.7071 |
126
+ | 0.6406 | 1.1027 | 6400 | 0.6629 | -0.3843 | -0.4611 | 0.6108 | 0.0768 | -109.2905 | -97.1394 | -2.6941 | -2.7003 |
127
+ | 0.6445 | 1.1199 | 6500 | 0.6626 | -0.3894 | -0.4670 | 0.6097 | 0.0776 | -109.8768 | -97.6507 | -2.6860 | -2.6923 |
128
+ | 0.6438 | 1.1371 | 6600 | 0.6627 | -0.3907 | -0.4683 | 0.6073 | 0.0776 | -110.0129 | -97.7839 | -2.6814 | -2.6877 |
129
+ | 0.6411 | 1.1544 | 6700 | 0.6622 | -0.3996 | -0.4791 | 0.6122 | 0.0795 | -111.0866 | -98.6695 | -2.6729 | -2.6791 |
130
+ | 0.6224 | 1.1716 | 6800 | 0.6614 | -0.4163 | -0.4982 | 0.6115 | 0.0819 | -112.9988 | -100.3370 | -2.6625 | -2.6688 |
131
+ | 0.6437 | 1.1888 | 6900 | 0.6610 | -0.4232 | -0.5064 | 0.6106 | 0.0832 | -113.8220 | -101.0292 | -2.6554 | -2.6618 |
132
+ | 0.6268 | 1.2061 | 7000 | 0.6604 | -0.4419 | -0.5278 | 0.6090 | 0.0859 | -115.9616 | -102.9045 | -2.6490 | -2.6553 |
133
+ | 0.6303 | 1.2233 | 7100 | 0.6604 | -0.4379 | -0.5238 | 0.6129 | 0.0859 | -115.5604 | -102.5041 | -2.6443 | -2.6506 |
134
+ | 0.6251 | 1.2405 | 7200 | 0.6600 | -0.4437 | -0.5309 | 0.6101 | 0.0872 | -116.2726 | -103.0814 | -2.6383 | -2.6448 |
135
+ | 0.6531 | 1.2578 | 7300 | 0.6602 | -0.4339 | -0.5202 | 0.6125 | 0.0863 | -115.1998 | -102.0999 | -2.6366 | -2.6430 |
136
+ | 0.6456 | 1.2750 | 7400 | 0.6600 | -0.4313 | -0.5180 | 0.6125 | 0.0867 | -114.9813 | -101.8414 | -2.6345 | -2.6409 |
137
+ | 0.6455 | 1.2922 | 7500 | 0.6597 | -0.4307 | -0.5180 | 0.6148 | 0.0873 | -114.9807 | -101.7862 | -2.6292 | -2.6357 |
138
+ | 0.6762 | 1.3094 | 7600 | 0.6593 | -0.4392 | -0.5278 | 0.6118 | 0.0887 | -115.9649 | -102.6288 | -2.6216 | -2.6281 |
139
+ | 0.6365 | 1.3267 | 7700 | 0.6592 | -0.4402 | -0.5295 | 0.6157 | 0.0893 | -116.1288 | -102.7343 | -2.6172 | -2.6237 |
140
+ | 0.6211 | 1.3439 | 7800 | 0.6588 | -0.4484 | -0.5389 | 0.6194 | 0.0906 | -117.0741 | -103.5481 | -2.6115 | -2.6180 |
141
+ | 0.641 | 1.3611 | 7900 | 0.6581 | -0.4553 | -0.5479 | 0.6217 | 0.0926 | -117.9735 | -104.2409 | -2.6077 | -2.6143 |
142
+ | 0.6228 | 1.3784 | 8000 | 0.6578 | -0.4583 | -0.5520 | 0.6215 | 0.0937 | -118.3795 | -104.5455 | -2.6043 | -2.6109 |
143
+ | 0.641 | 1.3956 | 8100 | 0.6579 | -0.4658 | -0.5596 | 0.6178 | 0.0939 | -119.1444 | -105.2910 | -2.5997 | -2.6063 |
144
+ | 0.6504 | 1.4128 | 8200 | 0.6571 | -0.4707 | -0.5666 | 0.6213 | 0.0959 | -119.8413 | -105.7863 | -2.5974 | -2.6040 |
145
+ | 0.6472 | 1.4300 | 8300 | 0.6573 | -0.4661 | -0.5612 | 0.6217 | 0.0951 | -119.3045 | -105.3220 | -2.5953 | -2.6018 |
146
+ | 0.6298 | 1.4473 | 8400 | 0.6573 | -0.4609 | -0.5560 | 0.6206 | 0.0950 | -118.7768 | -104.8056 | -2.5928 | -2.5994 |
147
+ | 0.6207 | 1.4645 | 8500 | 0.6573 | -0.4579 | -0.5531 | 0.6180 | 0.0952 | -118.4887 | -104.5014 | -2.5885 | -2.5952 |
148
+ | 0.6661 | 1.4817 | 8600 | 0.6571 | -0.4639 | -0.5598 | 0.6204 | 0.0959 | -119.1632 | -105.1053 | -2.5846 | -2.5913 |
149
+ | 0.6475 | 1.4990 | 8700 | 0.6572 | -0.4570 | -0.5525 | 0.6190 | 0.0954 | -118.4251 | -104.4133 | -2.5846 | -2.5912 |
150
+ | 0.6476 | 1.5162 | 8800 | 0.6569 | -0.4604 | -0.5566 | 0.6194 | 0.0962 | -118.8439 | -104.7545 | -2.5816 | -2.5883 |
151
+ | 0.6336 | 1.5334 | 8900 | 0.6568 | -0.4692 | -0.5663 | 0.6190 | 0.0971 | -119.8081 | -105.6329 | -2.5772 | -2.5839 |
152
+ | 0.6282 | 1.5507 | 9000 | 0.6564 | -0.4708 | -0.5690 | 0.6187 | 0.0981 | -120.0761 | -105.7962 | -2.5754 | -2.5821 |
153
+ | 0.646 | 1.5679 | 9100 | 0.6565 | -0.4724 | -0.5704 | 0.6187 | 0.0980 | -120.2213 | -105.9529 | -2.5732 | -2.5799 |
154
+ | 0.6225 | 1.5851 | 9200 | 0.6563 | -0.4762 | -0.5749 | 0.6190 | 0.0987 | -120.6733 | -106.3303 | -2.5714 | -2.5781 |
155
+ | 0.6223 | 1.6023 | 9300 | 0.6562 | -0.4763 | -0.5753 | 0.6180 | 0.0990 | -120.7107 | -106.3383 | -2.5692 | -2.5759 |
156
+ | 0.6288 | 1.6196 | 9400 | 0.6559 | -0.4818 | -0.5819 | 0.6201 | 0.1001 | -121.3710 | -106.8921 | -2.5664 | -2.5731 |
157
+ | 0.6223 | 1.6368 | 9500 | 0.6557 | -0.4823 | -0.5828 | 0.6176 | 0.1005 | -121.4601 | -106.9374 | -2.5650 | -2.5717 |
158
+ | 0.6363 | 1.6540 | 9600 | 0.6556 | -0.4891 | -0.5902 | 0.6197 | 0.1011 | -122.2042 | -107.6243 | -2.5615 | -2.5683 |
159
+ | 0.6355 | 1.6713 | 9700 | 0.6556 | -0.4880 | -0.5892 | 0.6211 | 0.1012 | -122.1034 | -107.5130 | -2.5609 | -2.5677 |
160
+ | 0.6247 | 1.6885 | 9800 | 0.6555 | -0.4894 | -0.5910 | 0.6201 | 0.1015 | -122.2755 | -107.6543 | -2.5603 | -2.5670 |
161
+ | 0.5826 | 1.7057 | 9900 | 0.6554 | -0.4911 | -0.5929 | 0.6206 | 0.1019 | -122.4715 | -107.8182 | -2.5591 | -2.5659 |
162
+ | 0.6181 | 1.7229 | 10000 | 0.6553 | -0.4923 | -0.5945 | 0.6204 | 0.1022 | -122.6296 | -107.9373 | -2.5579 | -2.5647 |
163
+ | 0.6365 | 1.7402 | 10100 | 0.6553 | -0.4917 | -0.5938 | 0.6201 | 0.1022 | -122.5635 | -107.8778 | -2.5567 | -2.5635 |
164
+ | 0.6269 | 1.7574 | 10200 | 0.6552 | -0.4952 | -0.5977 | 0.6208 | 0.1025 | -122.9497 | -108.2321 | -2.5556 | -2.5624 |
165
+ | 0.6573 | 1.7746 | 10300 | 0.6553 | -0.4962 | -0.5988 | 0.6201 | 0.1026 | -123.0645 | -108.3347 | -2.5542 | -2.5610 |
166
+ | 0.6036 | 1.7919 | 10400 | 0.6552 | -0.4953 | -0.5980 | 0.6197 | 0.1027 | -122.9784 | -108.2400 | -2.5542 | -2.5610 |
167
+ | 0.6178 | 1.8091 | 10500 | 0.6549 | -0.4956 | -0.5990 | 0.6213 | 0.1034 | -123.0831 | -108.2757 | -2.5531 | -2.5598 |
168
+ | 0.6403 | 1.8263 | 10600 | 0.6551 | -0.4967 | -0.5996 | 0.6204 | 0.1030 | -123.1450 | -108.3809 | -2.5527 | -2.5594 |
169
+ | 0.6341 | 1.8436 | 10700 | 0.6550 | -0.4965 | -0.5997 | 0.6206 | 0.1032 | -123.1496 | -108.3595 | -2.5523 | -2.5590 |
170
+ | 0.627 | 1.8608 | 10800 | 0.6549 | -0.4971 | -0.6006 | 0.6211 | 0.1035 | -123.2409 | -108.4216 | -2.5521 | -2.5589 |
171
+ | 0.6335 | 1.8780 | 10900 | 0.6550 | -0.4974 | -0.6009 | 0.6201 | 0.1035 | -123.2728 | -108.4564 | -2.5523 | -2.5590 |
172
+ | 0.6262 | 1.8952 | 11000 | 0.6550 | -0.4971 | -0.6003 | 0.6201 | 0.1033 | -123.2126 | -108.4185 | -2.5520 | -2.5588 |
173
+ | 0.6311 | 1.9125 | 11100 | 0.6548 | -0.4971 | -0.6009 | 0.6211 | 0.1038 | -123.2688 | -108.4253 | -2.5521 | -2.5589 |
174
+ | 0.6239 | 1.9297 | 11200 | 0.6551 | -0.4971 | -0.6003 | 0.6201 | 0.1031 | -123.2061 | -108.4263 | -2.5516 | -2.5583 |
175
+ | 0.6629 | 1.9469 | 11300 | 0.6550 | -0.4970 | -0.6003 | 0.6206 | 0.1033 | -123.2066 | -108.4107 | -2.5518 | -2.5587 |
176
+ | 0.6308 | 1.9642 | 11400 | 0.6550 | -0.4972 | -0.6005 | 0.6197 | 0.1033 | -123.2305 | -108.4360 | -2.5518 | -2.5586 |
177
+ | 0.6532 | 1.9814 | 11500 | 0.6550 | -0.4972 | -0.6005 | 0.6197 | 0.1033 | -123.2317 | -108.4313 | -2.5517 | -2.5585 |
178
+ | 0.6257 | 1.9986 | 11600 | 0.6549 | -0.4976 | -0.6010 | 0.6194 | 0.1035 | -123.2810 | -108.4673 | -2.5516 | -2.5584 |
179
 
180
 
181
  ### Framework versions
182
 
183
+ - Transformers 4.41.2
184
  - Pytorch 2.1.2
185
+ - Datasets 2.19.2
186
+ - Tokenizers 0.19.1
all_results.json CHANGED
@@ -1,21 +1,9 @@
1
  {
2
  "epoch": 2.0,
3
- "eval_logits/chosen": -1.4189956188201904,
4
- "eval_logits/rejected": -1.3920025825500488,
5
- "eval_logps/chosen": -244.38746643066406,
6
- "eval_logps/rejected": -295.19293212890625,
7
- "eval_loss": 0.6342188119888306,
8
- "eval_rewards/accuracies": 0.6579925417900085,
9
- "eval_rewards/chosen": -1.8568360805511475,
10
- "eval_rewards/margins": 0.4635196328163147,
11
- "eval_rewards/rejected": -2.3203558921813965,
12
- "eval_runtime": 354.8785,
13
- "eval_samples": 4304,
14
- "eval_samples_per_second": 12.128,
15
- "eval_steps_per_second": 1.516,
16
- "train_loss": 0.5042109644922366,
17
- "train_runtime": 89019.0317,
18
  "train_samples": 92858,
19
- "train_samples_per_second": 2.086,
20
- "train_steps_per_second": 0.13
21
  }
 
1
  {
2
  "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6539983197297005,
5
+ "train_runtime": 91180.3592,
 
 
 
 
 
 
 
 
 
 
 
 
6
  "train_samples": 92858,
7
+ "train_samples_per_second": 2.037,
8
+ "train_steps_per_second": 0.127
9
  }
generation_config.json CHANGED
@@ -3,5 +3,5 @@
3
  "eos_token_id": 2,
4
  "max_length": 2048,
5
  "pad_token_id": 0,
6
- "transformers_version": "4.39.3"
7
  }
 
3
  "eos_token_id": 2,
4
  "max_length": 2048,
5
  "pad_token_id": 0,
6
+ "transformers_version": "4.41.2"
7
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9c39d6b0c79527831c4774420234c68590a75e7edbd760328ecd257641a8f0b1
3
  size 4400216536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e41193d635b9fb73349550e5f7c474856643ad970f1621ab365b05e759ca253f
3
  size 4400216536
runs/Jun05_01-08-42_poseidon/events.out.tfevents.1717550053.poseidon.2879567.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:97849fa99bf47a9931b9471a0d63761ac037a13ff1d193e3d7617a65ff669b82
3
- size 889519
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb549b9be2712f973ca69d99a2751332878e2d48e13c43a4fc50455551b1e040
3
+ size 889873
train_results.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
  "epoch": 2.0,
3
- "train_loss": 0.5042109644922366,
4
- "train_runtime": 89019.0317,
 
5
  "train_samples": 92858,
6
- "train_samples_per_second": 2.086,
7
- "train_steps_per_second": 0.13
8
  }
 
1
  {
2
  "epoch": 2.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.6539983197297005,
5
+ "train_runtime": 91180.3592,
6
  "train_samples": 92858,
7
+ "train_samples_per_second": 2.037,
8
+ "train_steps_per_second": 0.127
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff