File size: 16,682 Bytes
fb1c4a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd1

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0961
- Num Input Tokens Seen: 56359768

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.6209        | 0.0048 | 5    | 1.3888          | 260784            |
| 1.6489        | 0.0095 | 10   | 1.3680          | 526144            |
| 1.532         | 0.0143 | 15   | 1.3201          | 802968            |
| 1.4996        | 0.0190 | 20   | 1.2669          | 1075152           |
| 1.3397        | 0.0238 | 25   | 1.2273          | 1346840           |
| 1.3125        | 0.0286 | 30   | 1.1901          | 1611296           |
| 1.1233        | 0.0333 | 35   | 1.1868          | 1879352           |
| 0.9443        | 0.0381 | 40   | 1.2076          | 2150000           |
| 0.8999        | 0.0428 | 45   | 1.2196          | 2423208           |
| 0.7159        | 0.0476 | 50   | 1.2657          | 2690856           |
| 0.5995        | 0.0524 | 55   | 1.2849          | 2959080           |
| 0.5344        | 0.0571 | 60   | 1.2687          | 3228184           |
| 0.4838        | 0.0619 | 65   | 1.2509          | 3494504           |
| 0.4883        | 0.0666 | 70   | 1.2302          | 3753896           |
| 0.2679        | 0.0714 | 75   | 1.2422          | 4017544           |
| 0.3494        | 0.0762 | 80   | 1.2179          | 4279264           |
| 0.2953        | 0.0809 | 85   | 1.2130          | 4546928           |
| 0.3641        | 0.0857 | 90   | 1.2049          | 4813768           |
| 0.3191        | 0.0904 | 95   | 1.1886          | 5081944           |
| 0.2461        | 0.0952 | 100  | 1.1918          | 5354328           |
| 0.2695        | 0.1000 | 105  | 1.1858          | 5614768           |
| 0.2698        | 0.1047 | 110  | 1.1833          | 5876376           |
| 0.233         | 0.1095 | 115  | 1.1827          | 6142224           |
| 0.2352        | 0.1143 | 120  | 1.1814          | 6407280           |
| 0.2541        | 0.1190 | 125  | 1.1773          | 6668672           |
| 0.2663        | 0.1238 | 130  | 1.1767          | 6941760           |
| 0.2468        | 0.1285 | 135  | 1.1729          | 7204472           |
| 0.2495        | 0.1333 | 140  | 1.1707          | 7475488           |
| 0.1993        | 0.1381 | 145  | 1.1663          | 7743640           |
| 0.2654        | 0.1428 | 150  | 1.1684          | 8002848           |
| 0.2451        | 0.1476 | 155  | 1.1671          | 8262672           |
| 0.2106        | 0.1523 | 160  | 1.1616          | 8533696           |
| 0.1963        | 0.1571 | 165  | 1.1632          | 8807600           |
| 0.1678        | 0.1619 | 170  | 1.1628          | 9072528           |
| 0.2143        | 0.1666 | 175  | 1.1583          | 9342824           |
| 0.1857        | 0.1714 | 180  | 1.1554          | 9613104           |
| 0.2452        | 0.1761 | 185  | 1.1616          | 9877512           |
| 0.2276        | 0.1809 | 190  | 1.1538          | 10145024          |
| 0.1419        | 0.1857 | 195  | 1.1536          | 10415848          |
| 0.2847        | 0.1904 | 200  | 1.1557          | 10688320          |
| 0.1709        | 0.1952 | 205  | 1.1516          | 10955016          |
| 0.2264        | 0.1999 | 210  | 1.1518          | 11221864          |
| 0.2114        | 0.2047 | 215  | 1.1476          | 11497224          |
| 0.1591        | 0.2095 | 220  | 1.1493          | 11764448          |
| 0.2429        | 0.2142 | 225  | 1.1469          | 12032800          |
| 0.222         | 0.2190 | 230  | 1.1445          | 12296112          |
| 0.1844        | 0.2237 | 235  | 1.1441          | 12567544          |
| 0.2173        | 0.2285 | 240  | 1.1410          | 12829728          |
| 0.2398        | 0.2333 | 245  | 1.1391          | 13093064          |
| 0.1395        | 0.2380 | 250  | 1.1423          | 13355296          |
| 0.1644        | 0.2428 | 255  | 1.1410          | 13617216          |
| 0.1951        | 0.2475 | 260  | 1.1406          | 13887744          |
| 0.1772        | 0.2523 | 265  | 1.1408          | 14158392          |
| 0.2206        | 0.2571 | 270  | 1.1384          | 14432008          |
| 0.2658        | 0.2618 | 275  | 1.1363          | 14695872          |
| 0.1841        | 0.2666 | 280  | 1.1364          | 14962984          |
| 0.1656        | 0.2713 | 285  | 1.1373          | 15239000          |
| 0.2024        | 0.2761 | 290  | 1.1360          | 15503752          |
| 0.1559        | 0.2809 | 295  | 1.1363          | 15765512          |
| 0.1714        | 0.2856 | 300  | 1.1352          | 16036816          |
| 0.102         | 0.2904 | 305  | 1.1352          | 16300064          |
| 0.2057        | 0.2952 | 310  | 1.1364          | 16571744          |
| 0.2353        | 0.2999 | 315  | 1.1326          | 16840544          |
| 0.1378        | 0.3047 | 320  | 1.1306          | 17110016          |
| 0.1395        | 0.3094 | 325  | 1.1366          | 17380776          |
| 0.1747        | 0.3142 | 330  | 1.1318          | 17647416          |
| 0.1444        | 0.3190 | 335  | 1.1308          | 17913208          |
| 0.2003        | 0.3237 | 340  | 1.1325          | 18180568          |
| 0.1373        | 0.3285 | 345  | 1.1339          | 18451296          |
| 0.1483        | 0.3332 | 350  | 1.1310          | 18726416          |
| 0.2017        | 0.3380 | 355  | 1.1290          | 18999216          |
| 0.1496        | 0.3428 | 360  | 1.1284          | 19277048          |
| 0.1912        | 0.3475 | 365  | 1.1289          | 19546024          |
| 0.1944        | 0.3523 | 370  | 1.1312          | 19817824          |
| 0.1897        | 0.3570 | 375  | 1.1294          | 20083960          |
| 0.1735        | 0.3618 | 380  | 1.1252          | 20350640          |
| 0.2085        | 0.3666 | 385  | 1.1258          | 20619120          |
| 0.1385        | 0.3713 | 390  | 1.1300          | 20888696          |
| 0.1942        | 0.3761 | 395  | 1.1233          | 21156856          |
| 0.1413        | 0.3808 | 400  | 1.1238          | 21425648          |
| 0.2178        | 0.3856 | 405  | 1.1257          | 21696448          |
| 0.2536        | 0.3904 | 410  | 1.1219          | 21967312          |
| 0.1956        | 0.3951 | 415  | 1.1249          | 22234304          |
| 0.1643        | 0.3999 | 420  | 1.1239          | 22503168          |
| 0.2683        | 0.4046 | 425  | 1.1195          | 22769672          |
| 0.1949        | 0.4094 | 430  | 1.1190          | 23040264          |
| 0.2001        | 0.4142 | 435  | 1.1240          | 23309600          |
| 0.1348        | 0.4189 | 440  | 1.1218          | 23579856          |
| 0.1836        | 0.4237 | 445  | 1.1212          | 23852144          |
| 0.1498        | 0.4284 | 450  | 1.1212          | 24114304          |
| 0.1595        | 0.4332 | 455  | 1.1242          | 24376912          |
| 0.1384        | 0.4380 | 460  | 1.1204          | 24644368          |
| 0.1569        | 0.4427 | 465  | 1.1194          | 24915744          |
| 0.1477        | 0.4475 | 470  | 1.1190          | 25183280          |
| 0.1853        | 0.4522 | 475  | 1.1173          | 25457376          |
| 0.1485        | 0.4570 | 480  | 1.1187          | 25732664          |
| 0.165         | 0.4618 | 485  | 1.1204          | 26004360          |
| 0.1977        | 0.4665 | 490  | 1.1197          | 26270144          |
| 0.1273        | 0.4713 | 495  | 1.1173          | 26541272          |
| 0.2433        | 0.4760 | 500  | 1.1174          | 26806808          |
| 0.1909        | 0.4808 | 505  | 1.1178          | 27074376          |
| 0.191         | 0.4856 | 510  | 1.1189          | 27338952          |
| 0.2088        | 0.4903 | 515  | 1.1169          | 27606808          |
| 0.1777        | 0.4951 | 520  | 1.1147          | 27875304          |
| 0.208         | 0.4999 | 525  | 1.1175          | 28144272          |
| 0.1745        | 0.5046 | 530  | 1.1159          | 28409000          |
| 0.1306        | 0.5094 | 535  | 1.1128          | 28674056          |
| 0.1432        | 0.5141 | 540  | 1.1160          | 28943648          |
| 0.2056        | 0.5189 | 545  | 1.1164          | 29207648          |
| 0.1777        | 0.5237 | 550  | 1.1132          | 29477544          |
| 0.2033        | 0.5284 | 555  | 1.1140          | 29744816          |
| 0.1983        | 0.5332 | 560  | 1.1136          | 30021232          |
| 0.2389        | 0.5379 | 565  | 1.1130          | 30291032          |
| 0.1681        | 0.5427 | 570  | 1.1152          | 30555728          |
| 0.1639        | 0.5475 | 575  | 1.1131          | 30827752          |
| 0.195         | 0.5522 | 580  | 1.1102          | 31097840          |
| 0.1447        | 0.5570 | 585  | 1.1113          | 31373424          |
| 0.2198        | 0.5617 | 590  | 1.1115          | 31639232          |
| 0.1382        | 0.5665 | 595  | 1.1116          | 31901832          |
| 0.1605        | 0.5713 | 600  | 1.1122          | 32167752          |
| 0.2186        | 0.5760 | 605  | 1.1121          | 32436896          |
| 0.1891        | 0.5808 | 610  | 1.1104          | 32710184          |
| 0.1787        | 0.5855 | 615  | 1.1113          | 32984864          |
| 0.1706        | 0.5903 | 620  | 1.1107          | 33251232          |
| 0.2048        | 0.5951 | 625  | 1.1105          | 33527304          |
| 0.191         | 0.5998 | 630  | 1.1102          | 33798576          |
| 0.124         | 0.6046 | 635  | 1.1098          | 34063624          |
| 0.1499        | 0.6093 | 640  | 1.1079          | 34330376          |
| 0.1055        | 0.6141 | 645  | 1.1087          | 34599840          |
| 0.164         | 0.6189 | 650  | 1.1103          | 34865960          |
| 0.1665        | 0.6236 | 655  | 1.1105          | 35135704          |
| 0.14          | 0.6284 | 660  | 1.1088          | 35404640          |
| 0.1862        | 0.6331 | 665  | 1.1116          | 35670952          |
| 0.196         | 0.6379 | 670  | 1.1110          | 35938232          |
| 0.1475        | 0.6427 | 675  | 1.1083          | 36200712          |
| 0.1698        | 0.6474 | 680  | 1.1059          | 36476144          |
| 0.1544        | 0.6522 | 685  | 1.1072          | 36741712          |
| 0.1455        | 0.6569 | 690  | 1.1097          | 37007608          |
| 0.2331        | 0.6617 | 695  | 1.1074          | 37267184          |
| 0.1697        | 0.6665 | 700  | 1.1065          | 37537536          |
| 0.1208        | 0.6712 | 705  | 1.1076          | 37799632          |
| 0.1679        | 0.6760 | 710  | 1.1089          | 38067184          |
| 0.1931        | 0.6807 | 715  | 1.1075          | 38340032          |
| 0.1315        | 0.6855 | 720  | 1.1077          | 38613992          |
| 0.1194        | 0.6903 | 725  | 1.1079          | 38894384          |
| 0.1902        | 0.6950 | 730  | 1.1070          | 39172040          |
| 0.1675        | 0.6998 | 735  | 1.1072          | 39444864          |
| 0.1516        | 0.7046 | 740  | 1.1061          | 39716440          |
| 0.0847        | 0.7093 | 745  | 1.1049          | 39983736          |
| 0.1703        | 0.7141 | 750  | 1.1057          | 40256696          |
| 0.1791        | 0.7188 | 755  | 1.1056          | 40521264          |
| 0.2551        | 0.7236 | 760  | 1.1044          | 40793072          |
| 0.1814        | 0.7284 | 765  | 1.1054          | 41064248          |
| 0.126         | 0.7331 | 770  | 1.1070          | 41338416          |
| 0.211         | 0.7379 | 775  | 1.1049          | 41600992          |
| 0.1668        | 0.7426 | 780  | 1.1043          | 41870408          |
| 0.1821        | 0.7474 | 785  | 1.1061          | 42139008          |
| 0.186         | 0.7522 | 790  | 1.1033          | 42407016          |
| 0.209         | 0.7569 | 795  | 1.1039          | 42662976          |
| 0.226         | 0.7617 | 800  | 1.1040          | 42934592          |
| 0.1668        | 0.7664 | 805  | 1.1026          | 43199808          |
| 0.2089        | 0.7712 | 810  | 1.1019          | 43460640          |
| 0.1736        | 0.7760 | 815  | 1.1038          | 43729072          |
| 0.1403        | 0.7807 | 820  | 1.1022          | 43997664          |
| 0.1947        | 0.7855 | 825  | 1.1017          | 44258840          |
| 0.1333        | 0.7902 | 830  | 1.1020          | 44518528          |
| 0.2415        | 0.7950 | 835  | 1.1042          | 44785256          |
| 0.1791        | 0.7998 | 840  | 1.1018          | 45057824          |
| 0.2226        | 0.8045 | 845  | 1.1013          | 45326808          |
| 0.1988        | 0.8093 | 850  | 1.1012          | 45595496          |
| 0.207         | 0.8140 | 855  | 1.1026          | 45862328          |
| 0.1112        | 0.8188 | 860  | 1.1019          | 46130024          |
| 0.1775        | 0.8236 | 865  | 1.1030          | 46405848          |
| 0.2009        | 0.8283 | 870  | 1.1019          | 46676936          |
| 0.1478        | 0.8331 | 875  | 1.1004          | 46955008          |
| 0.2381        | 0.8378 | 880  | 1.1006          | 47220736          |
| 0.1951        | 0.8426 | 885  | 1.0998          | 47486768          |
| 0.1363        | 0.8474 | 890  | 1.0995          | 47750624          |
| 0.1287        | 0.8521 | 895  | 1.0994          | 48029400          |
| 0.144         | 0.8569 | 900  | 1.1004          | 48301424          |
| 0.1721        | 0.8616 | 905  | 1.0982          | 48569280          |
| 0.1385        | 0.8664 | 910  | 1.0990          | 48836384          |
| 0.1721        | 0.8712 | 915  | 1.0983          | 49104000          |
| 0.2214        | 0.8759 | 920  | 1.0981          | 49378064          |
| 0.1441        | 0.8807 | 925  | 1.0987          | 49643256          |
| 0.2227        | 0.8855 | 930  | 1.1017          | 49914304          |
| 0.1388        | 0.8902 | 935  | 1.1024          | 50184528          |
| 0.1303        | 0.8950 | 940  | 1.0992          | 50453176          |
| 0.192         | 0.8997 | 945  | 1.0968          | 50723312          |
| 0.1817        | 0.9045 | 950  | 1.0985          | 50998824          |
| 0.1661        | 0.9093 | 955  | 1.0989          | 51273248          |
| 0.1249        | 0.9140 | 960  | 1.0994          | 51535824          |
| 0.1622        | 0.9188 | 965  | 1.0993          | 51805072          |
| 0.1294        | 0.9235 | 970  | 1.0982          | 52074128          |
| 0.1132        | 0.9283 | 975  | 1.0975          | 52340296          |
| 0.109         | 0.9331 | 980  | 1.0977          | 52606592          |
| 0.1585        | 0.9378 | 985  | 1.0972          | 52876464          |
| 0.1702        | 0.9426 | 990  | 1.0972          | 53144688          |
| 0.1798        | 0.9473 | 995  | 1.0986          | 53419016          |
| 0.2313        | 0.9521 | 1000 | 1.0993          | 53685696          |
| 0.1984        | 0.9569 | 1005 | 1.0963          | 53948536          |
| 0.1253        | 0.9616 | 1010 | 1.0970          | 54213712          |
| 0.1165        | 0.9664 | 1015 | 1.0979          | 54480072          |
| 0.181         | 0.9711 | 1020 | 1.0970          | 54753368          |
| 0.1439        | 0.9759 | 1025 | 1.0959          | 55022136          |
| 0.1115        | 0.9807 | 1030 | 1.0979          | 55293552          |
| 0.1213        | 0.9854 | 1035 | 1.0991          | 55557936          |
| 0.1227        | 0.9902 | 1040 | 1.0979          | 55828888          |
| 0.1455        | 0.9949 | 1045 | 1.0967          | 56094144          |
| 0.1732        | 0.9997 | 1050 | 1.0961          | 56359768          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1