File size: 19,418 Bytes
50a06a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter13_sftsd1

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0988
- Num Input Tokens Seen: 66474032

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.6496        | 0.0040 | 5    | 1.3888          | 269088            |
| 1.593         | 0.0080 | 10   | 1.3738          | 532768            |
| 1.6319        | 0.0121 | 15   | 1.3415          | 797616            |
| 1.4412        | 0.0161 | 20   | 1.2878          | 1059528           |
| 1.4212        | 0.0201 | 25   | 1.2452          | 1330056           |
| 1.3105        | 0.0241 | 30   | 1.2106          | 1595496           |
| 1.2004        | 0.0281 | 35   | 1.1870          | 1856096           |
| 1.122         | 0.0321 | 40   | 1.1980          | 2128096           |
| 0.9857        | 0.0362 | 45   | 1.2116          | 2397312           |
| 0.8123        | 0.0402 | 50   | 1.2458          | 2660648           |
| 0.6974        | 0.0442 | 55   | 1.2866          | 2923440           |
| 0.5779        | 0.0482 | 60   | 1.2544          | 3190904           |
| 0.6053        | 0.0522 | 65   | 1.2958          | 3466056           |
| 0.377         | 0.0562 | 70   | 1.2836          | 3738872           |
| 0.437         | 0.0603 | 75   | 1.2394          | 4008880           |
| 0.2844        | 0.0643 | 80   | 1.2326          | 4270216           |
| 0.2743        | 0.0683 | 85   | 1.2176          | 4534544           |
| 0.2454        | 0.0723 | 90   | 1.2031          | 4797656           |
| 0.3017        | 0.0763 | 95   | 1.2110          | 5064904           |
| 0.2919        | 0.0804 | 100  | 1.1901          | 5325960           |
| 0.2755        | 0.0844 | 105  | 1.1899          | 5588816           |
| 0.2508        | 0.0884 | 110  | 1.1932          | 5859792           |
| 0.2048        | 0.0924 | 115  | 1.1895          | 6124448           |
| 0.1805        | 0.0964 | 120  | 1.1991          | 6394440           |
| 0.2482        | 0.1004 | 125  | 1.1865          | 6660424           |
| 0.2114        | 0.1045 | 130  | 1.1828          | 6925280           |
| 0.2454        | 0.1085 | 135  | 1.1801          | 7192496           |
| 0.2305        | 0.1125 | 140  | 1.1733          | 7456696           |
| 0.1829        | 0.1165 | 145  | 1.1778          | 7723888           |
| 0.2417        | 0.1205 | 150  | 1.1796          | 7998624           |
| 0.1485        | 0.1245 | 155  | 1.1714          | 8271672           |
| 0.1433        | 0.1286 | 160  | 1.1770          | 8546408           |
| 0.2375        | 0.1326 | 165  | 1.1716          | 8816744           |
| 0.1699        | 0.1366 | 170  | 1.1698          | 9086496           |
| 0.1136        | 0.1406 | 175  | 1.1651          | 9346888           |
| 0.1336        | 0.1446 | 180  | 1.1702          | 9619312           |
| 0.1598        | 0.1487 | 185  | 1.1609          | 9885952           |
| 0.0921        | 0.1527 | 190  | 1.1622          | 10153872          |
| 0.2749        | 0.1567 | 195  | 1.1658          | 10421200          |
| 0.2119        | 0.1607 | 200  | 1.1574          | 10694680          |
| 0.2545        | 0.1647 | 205  | 1.1574          | 10966232          |
| 0.242         | 0.1687 | 210  | 1.1530          | 11232608          |
| 0.1785        | 0.1728 | 215  | 1.1555          | 11495504          |
| 0.2243        | 0.1768 | 220  | 1.1555          | 11761088          |
| 0.257         | 0.1808 | 225  | 1.1501          | 12034208          |
| 0.1593        | 0.1848 | 230  | 1.1525          | 12297864          |
| 0.2022        | 0.1888 | 235  | 1.1533          | 12565760          |
| 0.2072        | 0.1928 | 240  | 1.1519          | 12833240          |
| 0.1091        | 0.1969 | 245  | 1.1511          | 13102024          |
| 0.0845        | 0.2009 | 250  | 1.1520          | 13362240          |
| 0.2093        | 0.2049 | 255  | 1.1502          | 13625696          |
| 0.1741        | 0.2089 | 260  | 1.1467          | 13890168          |
| 0.1188        | 0.2129 | 265  | 1.1540          | 14154448          |
| 0.3031        | 0.2170 | 270  | 1.1497          | 14419648          |
| 0.1891        | 0.2210 | 275  | 1.1464          | 14674784          |
| 0.2016        | 0.2250 | 280  | 1.1447          | 14949488          |
| 0.1007        | 0.2290 | 285  | 1.1460          | 15214800          |
| 0.1779        | 0.2330 | 290  | 1.1475          | 15483240          |
| 0.195         | 0.2370 | 295  | 1.1398          | 15751536          |
| 0.2069        | 0.2411 | 300  | 1.1429          | 16014584          |
| 0.1597        | 0.2451 | 305  | 1.1420          | 16277072          |
| 0.111         | 0.2491 | 310  | 1.1397          | 16540864          |
| 0.107         | 0.2531 | 315  | 1.1423          | 16804568          |
| 0.1212        | 0.2571 | 320  | 1.1387          | 17077128          |
| 0.1412        | 0.2611 | 325  | 1.1382          | 17348320          |
| 0.1192        | 0.2652 | 330  | 1.1419          | 17612576          |
| 0.1879        | 0.2692 | 335  | 1.1388          | 17876784          |
| 0.1433        | 0.2732 | 340  | 1.1362          | 18142544          |
| 0.1748        | 0.2772 | 345  | 1.1411          | 18415672          |
| 0.1677        | 0.2812 | 350  | 1.1373          | 18683536          |
| 0.1358        | 0.2853 | 355  | 1.1346          | 18952888          |
| 0.1712        | 0.2893 | 360  | 1.1369          | 19218360          |
| 0.1619        | 0.2933 | 365  | 1.1386          | 19483840          |
| 0.1071        | 0.2973 | 370  | 1.1347          | 19756976          |
| 0.2192        | 0.3013 | 375  | 1.1322          | 20022776          |
| 0.1235        | 0.3053 | 380  | 1.1334          | 20289712          |
| 0.2287        | 0.3094 | 385  | 1.1345          | 20559104          |
| 0.1922        | 0.3134 | 390  | 1.1295          | 20823864          |
| 0.1379        | 0.3174 | 395  | 1.1306          | 21082544          |
| 0.109         | 0.3214 | 400  | 1.1325          | 21356280          |
| 0.1387        | 0.3254 | 405  | 1.1298          | 21630688          |
| 0.1094        | 0.3294 | 410  | 1.1290          | 21895440          |
| 0.1573        | 0.3335 | 415  | 1.1295          | 22163328          |
| 0.1252        | 0.3375 | 420  | 1.1275          | 22422360          |
| 0.1323        | 0.3415 | 425  | 1.1309          | 22693992          |
| 0.1553        | 0.3455 | 430  | 1.1275          | 22960416          |
| 0.0841        | 0.3495 | 435  | 1.1282          | 23224648          |
| 0.1479        | 0.3536 | 440  | 1.1303          | 23485960          |
| 0.1776        | 0.3576 | 445  | 1.1319          | 23757080          |
| 0.1108        | 0.3616 | 450  | 1.1295          | 24019992          |
| 0.1577        | 0.3656 | 455  | 1.1281          | 24283712          |
| 0.1419        | 0.3696 | 460  | 1.1281          | 24555736          |
| 0.1669        | 0.3736 | 465  | 1.1274          | 24819064          |
| 0.175         | 0.3777 | 470  | 1.1248          | 25091464          |
| 0.1287        | 0.3817 | 475  | 1.1257          | 25360944          |
| 0.1303        | 0.3857 | 480  | 1.1300          | 25627840          |
| 0.2149        | 0.3897 | 485  | 1.1238          | 25895920          |
| 0.1754        | 0.3937 | 490  | 1.1214          | 26159488          |
| 0.1381        | 0.3978 | 495  | 1.1240          | 26425400          |
| 0.1971        | 0.4018 | 500  | 1.1243          | 26695288          |
| 0.1112        | 0.4058 | 505  | 1.1231          | 26958128          |
| 0.1507        | 0.4098 | 510  | 1.1190          | 27224768          |
| 0.2245        | 0.4138 | 515  | 1.1196          | 27490376          |
| 0.1332        | 0.4178 | 520  | 1.1214          | 27759472          |
| 0.2522        | 0.4219 | 525  | 1.1237          | 28021432          |
| 0.1485        | 0.4259 | 530  | 1.1195          | 28293960          |
| 0.1108        | 0.4299 | 535  | 1.1196          | 28565520          |
| 0.1354        | 0.4339 | 540  | 1.1205          | 28830248          |
| 0.188         | 0.4379 | 545  | 1.1186          | 29098632          |
| 0.1505        | 0.4419 | 550  | 1.1169          | 29366008          |
| 0.2583        | 0.4460 | 555  | 1.1186          | 29631632          |
| 0.1734        | 0.4500 | 560  | 1.1181          | 29892432          |
| 0.1396        | 0.4540 | 565  | 1.1191          | 30155064          |
| 0.147         | 0.4580 | 570  | 1.1185          | 30425328          |
| 0.1781        | 0.4620 | 575  | 1.1157          | 30687912          |
| 0.087         | 0.4661 | 580  | 1.1194          | 30955536          |
| 0.1667        | 0.4701 | 585  | 1.1211          | 31223528          |
| 0.2041        | 0.4741 | 590  | 1.1164          | 31486616          |
| 0.1368        | 0.4781 | 595  | 1.1163          | 31756680          |
| 0.1193        | 0.4821 | 600  | 1.1166          | 32029360          |
| 0.1863        | 0.4861 | 605  | 1.1142          | 32300840          |
| 0.1692        | 0.4902 | 610  | 1.1145          | 32559992          |
| 0.1551        | 0.4942 | 615  | 1.1158          | 32820160          |
| 0.1233        | 0.4982 | 620  | 1.1139          | 33090856          |
| 0.2353        | 0.5022 | 625  | 1.1132          | 33356216          |
| 0.0917        | 0.5062 | 630  | 1.1161          | 33627544          |
| 0.1523        | 0.5102 | 635  | 1.1159          | 33898952          |
| 0.1818        | 0.5143 | 640  | 1.1135          | 34166040          |
| 0.0914        | 0.5183 | 645  | 1.1139          | 34432080          |
| 0.1609        | 0.5223 | 650  | 1.1142          | 34695128          |
| 0.1164        | 0.5263 | 655  | 1.1137          | 34960016          |
| 0.1476        | 0.5303 | 660  | 1.1127          | 35227024          |
| 0.1514        | 0.5344 | 665  | 1.1138          | 35502752          |
| 0.1921        | 0.5384 | 670  | 1.1135          | 35777480          |
| 0.1547        | 0.5424 | 675  | 1.1111          | 36051128          |
| 0.1647        | 0.5464 | 680  | 1.1128          | 36324632          |
| 0.1431        | 0.5504 | 685  | 1.1132          | 36599048          |
| 0.1537        | 0.5544 | 690  | 1.1113          | 36868312          |
| 0.1508        | 0.5585 | 695  | 1.1119          | 37137304          |
| 0.1446        | 0.5625 | 700  | 1.1121          | 37400984          |
| 0.1871        | 0.5665 | 705  | 1.1104          | 37670160          |
| 0.1148        | 0.5705 | 710  | 1.1093          | 37937456          |
| 0.1809        | 0.5745 | 715  | 1.1107          | 38213656          |
| 0.1562        | 0.5785 | 720  | 1.1134          | 38481208          |
| 0.1856        | 0.5826 | 725  | 1.1124          | 38748528          |
| 0.2117        | 0.5866 | 730  | 1.1110          | 39014688          |
| 0.1334        | 0.5906 | 735  | 1.1086          | 39285112          |
| 0.1282        | 0.5946 | 740  | 1.1083          | 39558336          |
| 0.1079        | 0.5986 | 745  | 1.1078          | 39816608          |
| 0.2084        | 0.6027 | 750  | 1.1080          | 40081864          |
| 0.1388        | 0.6067 | 755  | 1.1099          | 40349832          |
| 0.1496        | 0.6107 | 760  | 1.1095          | 40617056          |
| 0.123         | 0.6147 | 765  | 1.1066          | 40887032          |
| 0.0792        | 0.6187 | 770  | 1.1065          | 41148104          |
| 0.1639        | 0.6227 | 775  | 1.1086          | 41423424          |
| 0.2501        | 0.6268 | 780  | 1.1078          | 41700288          |
| 0.115         | 0.6308 | 785  | 1.1090          | 41971832          |
| 0.1738        | 0.6348 | 790  | 1.1083          | 42239944          |
| 0.1595        | 0.6388 | 795  | 1.1061          | 42497488          |
| 0.1121        | 0.6428 | 800  | 1.1059          | 42763824          |
| 0.1503        | 0.6468 | 805  | 1.1075          | 43033424          |
| 0.0887        | 0.6509 | 810  | 1.1048          | 43299520          |
| 0.1208        | 0.6549 | 815  | 1.1063          | 43567272          |
| 0.1165        | 0.6589 | 820  | 1.1090          | 43830216          |
| 0.136         | 0.6629 | 825  | 1.1080          | 44101312          |
| 0.1441        | 0.6669 | 830  | 1.1059          | 44372208          |
| 0.1372        | 0.6710 | 835  | 1.1074          | 44629960          |
| 0.0905        | 0.6750 | 840  | 1.1078          | 44894304          |
| 0.17          | 0.6790 | 845  | 1.1058          | 45163432          |
| 0.1861        | 0.6830 | 850  | 1.1047          | 45430264          |
| 0.1535        | 0.6870 | 855  | 1.1053          | 45705032          |
| 0.2079        | 0.6910 | 860  | 1.1057          | 45973272          |
| 0.1795        | 0.6951 | 865  | 1.1057          | 46238200          |
| 0.1819        | 0.6991 | 870  | 1.1061          | 46508080          |
| 0.1625        | 0.7031 | 875  | 1.1057          | 46775056          |
| 0.157         | 0.7071 | 880  | 1.1041          | 47045584          |
| 0.1586        | 0.7111 | 885  | 1.1041          | 47315400          |
| 0.1219        | 0.7151 | 890  | 1.1043          | 47581088          |
| 0.1534        | 0.7192 | 895  | 1.1045          | 47844512          |
| 0.1423        | 0.7232 | 900  | 1.1032          | 48114328          |
| 0.1358        | 0.7272 | 905  | 1.1040          | 48380520          |
| 0.127         | 0.7312 | 910  | 1.1042          | 48649872          |
| 0.1462        | 0.7352 | 915  | 1.1043          | 48920232          |
| 0.154         | 0.7393 | 920  | 1.1035          | 49186984          |
| 0.1847        | 0.7433 | 925  | 1.1041          | 49454928          |
| 0.1678        | 0.7473 | 930  | 1.1053          | 49722280          |
| 0.1658        | 0.7513 | 935  | 1.1050          | 49988024          |
| 0.1301        | 0.7553 | 940  | 1.1053          | 50255760          |
| 0.1239        | 0.7593 | 945  | 1.1044          | 50530080          |
| 0.1458        | 0.7634 | 950  | 1.1037          | 50792368          |
| 0.152         | 0.7674 | 955  | 1.1041          | 51052328          |
| 0.1736        | 0.7714 | 960  | 1.1041          | 51318808          |
| 0.1981        | 0.7754 | 965  | 1.1030          | 51586904          |
| 0.1032        | 0.7794 | 970  | 1.1021          | 51861168          |
| 0.1126        | 0.7834 | 975  | 1.1050          | 52129208          |
| 0.2006        | 0.7875 | 980  | 1.1045          | 52395312          |
| 0.2615        | 0.7915 | 985  | 1.1011          | 52661168          |
| 0.1574        | 0.7955 | 990  | 1.1013          | 52923160          |
| 0.183         | 0.7995 | 995  | 1.1067          | 53179296          |
| 0.1247        | 0.8035 | 1000 | 1.1045          | 53445496          |
| 0.136         | 0.8076 | 1005 | 1.1013          | 53714992          |
| 0.2123        | 0.8116 | 1010 | 1.1015          | 53973440          |
| 0.1449        | 0.8156 | 1015 | 1.1025          | 54238472          |
| 0.2289        | 0.8196 | 1020 | 1.1019          | 54508944          |
| 0.1454        | 0.8236 | 1025 | 1.1013          | 54782640          |
| 0.1422        | 0.8276 | 1030 | 1.1022          | 55052512          |
| 0.1588        | 0.8317 | 1035 | 1.1022          | 55320536          |
| 0.1174        | 0.8357 | 1040 | 1.1024          | 55587976          |
| 0.1778        | 0.8397 | 1045 | 1.1006          | 55850544          |
| 0.2064        | 0.8437 | 1050 | 1.1019          | 56111488          |
| 0.1348        | 0.8477 | 1055 | 1.1043          | 56379936          |
| 0.1454        | 0.8517 | 1060 | 1.1027          | 56633752          |
| 0.0895        | 0.8558 | 1065 | 1.0997          | 56900624          |
| 0.1199        | 0.8598 | 1070 | 1.1008          | 57165704          |
| 0.1866        | 0.8638 | 1075 | 1.1013          | 57431640          |
| 0.1512        | 0.8678 | 1080 | 1.1002          | 57697040          |
| 0.1935        | 0.8718 | 1085 | 1.1003          | 57971200          |
| 0.1479        | 0.8759 | 1090 | 1.1003          | 58235216          |
| 0.1603        | 0.8799 | 1095 | 1.1010          | 58505320          |
| 0.1545        | 0.8839 | 1100 | 1.1004          | 58781952          |
| 0.1349        | 0.8879 | 1105 | 1.0978          | 59054312          |
| 0.1038        | 0.8919 | 1110 | 1.0981          | 59316192          |
| 0.2127        | 0.8959 | 1115 | 1.0985          | 59576760          |
| 0.2207        | 0.9000 | 1120 | 1.0978          | 59841800          |
| 0.1447        | 0.9040 | 1125 | 1.0980          | 60108152          |
| 0.1445        | 0.9080 | 1130 | 1.0986          | 60381688          |
| 0.123         | 0.9120 | 1135 | 1.0985          | 60644416          |
| 0.1337        | 0.9160 | 1140 | 1.0972          | 60914960          |
| 0.1519        | 0.9200 | 1145 | 1.0964          | 61189320          |
| 0.1618        | 0.9241 | 1150 | 1.0997          | 61451944          |
| 0.1586        | 0.9281 | 1155 | 1.1000          | 61715960          |
| 0.1538        | 0.9321 | 1160 | 1.0981          | 61986840          |
| 0.0929        | 0.9361 | 1165 | 1.0972          | 62255312          |
| 0.1543        | 0.9401 | 1170 | 1.0973          | 62523592          |
| 0.1406        | 0.9442 | 1175 | 1.0976          | 62795320          |
| 0.1527        | 0.9482 | 1180 | 1.0970          | 63061184          |
| 0.1556        | 0.9522 | 1185 | 1.0975          | 63326856          |
| 0.2417        | 0.9562 | 1190 | 1.0983          | 63598528          |
| 0.1064        | 0.9602 | 1195 | 1.1001          | 63861592          |
| 0.1908        | 0.9642 | 1200 | 1.0971          | 64129760          |
| 0.1303        | 0.9683 | 1205 | 1.0958          | 64399112          |
| 0.1397        | 0.9723 | 1210 | 1.0972          | 64666312          |
| 0.1802        | 0.9763 | 1215 | 1.0971          | 64938056          |
| 0.1478        | 0.9803 | 1220 | 1.0970          | 65198400          |
| 0.1511        | 0.9843 | 1225 | 1.0966          | 65460480          |
| 0.1352        | 0.9883 | 1230 | 1.0973          | 65730520          |
| 0.1681        | 0.9924 | 1235 | 1.0983          | 65993712          |
| 0.1158        | 0.9964 | 1240 | 1.0982          | 66264848          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1