File size: 13,800 Bytes
eb8e061
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1119
- Num Input Tokens Seen: 46755704

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.7072        | 0.0059 | 5    | 1.3922          | 267024            |
| 1.5511        | 0.0117 | 10   | 1.3635          | 548560            |
| 1.5199        | 0.0176 | 15   | 1.2986          | 830488            |
| 1.4321        | 0.0234 | 20   | 1.2487          | 1101904           |
| 1.4119        | 0.0293 | 25   | 1.2016          | 1377928           |
| 1.2848        | 0.0351 | 30   | 1.1721          | 1661264           |
| 1.2144        | 0.0410 | 35   | 1.1683          | 1943544           |
| 1.1417        | 0.0468 | 40   | 1.1494          | 2221200           |
| 0.9487        | 0.0527 | 45   | 1.1817          | 2498336           |
| 0.9135        | 0.0585 | 50   | 1.1897          | 2763296           |
| 0.9445        | 0.0644 | 55   | 1.2065          | 3036752           |
| 0.8185        | 0.0702 | 60   | 1.2143          | 3310768           |
| 0.606         | 0.0761 | 65   | 1.2347          | 3581304           |
| 0.7169        | 0.0819 | 70   | 1.2386          | 3866600           |
| 0.6866        | 0.0878 | 75   | 1.2317          | 4142056           |
| 0.6366        | 0.0936 | 80   | 1.2168          | 4416216           |
| 0.5326        | 0.0995 | 85   | 1.2149          | 4686560           |
| 0.4364        | 0.1053 | 90   | 1.2276          | 4959768           |
| 0.4029        | 0.1112 | 95   | 1.2208          | 5230968           |
| 0.3969        | 0.1170 | 100  | 1.2249          | 5504384           |
| 0.4026        | 0.1229 | 105  | 1.2208          | 5775920           |
| 0.4528        | 0.1287 | 110  | 1.2238          | 6049704           |
| 0.4096        | 0.1346 | 115  | 1.2106          | 6327216           |
| 0.3988        | 0.1404 | 120  | 1.2170          | 6601008           |
| 0.4273        | 0.1463 | 125  | 1.2074          | 6877208           |
| 0.3648        | 0.1521 | 130  | 1.2093          | 7155944           |
| 0.282         | 0.1580 | 135  | 1.1978          | 7432256           |
| 0.3538        | 0.1638 | 140  | 1.2051          | 7705992           |
| 0.4239        | 0.1697 | 145  | 1.1951          | 7982024           |
| 0.4044        | 0.1755 | 150  | 1.2000          | 8253584           |
| 0.4297        | 0.1814 | 155  | 1.2035          | 8534352           |
| 0.2586        | 0.1872 | 160  | 1.1974          | 8795744           |
| 0.2682        | 0.1931 | 165  | 1.2044          | 9068272           |
| 0.3477        | 0.1989 | 170  | 1.1952          | 9346008           |
| 0.3633        | 0.2048 | 175  | 1.1954          | 9614616           |
| 0.3786        | 0.2106 | 180  | 1.1975          | 9889768           |
| 0.312         | 0.2165 | 185  | 1.1918          | 10167624          |
| 0.3204        | 0.2223 | 190  | 1.1910          | 10437856          |
| 0.3476        | 0.2282 | 195  | 1.1900          | 10712832          |
| 0.2801        | 0.2340 | 200  | 1.1882          | 10977528          |
| 0.2675        | 0.2399 | 205  | 1.1885          | 11245504          |
| 0.2818        | 0.2457 | 210  | 1.1840          | 11514312          |
| 0.2689        | 0.2516 | 215  | 1.1851          | 11793000          |
| 0.3491        | 0.2574 | 220  | 1.1872          | 12068576          |
| 0.3424        | 0.2633 | 225  | 1.1802          | 12342256          |
| 0.2694        | 0.2691 | 230  | 1.1810          | 12614032          |
| 0.4132        | 0.2750 | 235  | 1.1728          | 12882712          |
| 0.2893        | 0.2808 | 240  | 1.1700          | 13149128          |
| 0.2847        | 0.2867 | 245  | 1.1856          | 13421176          |
| 0.3198        | 0.2925 | 250  | 1.1693          | 13696120          |
| 0.2038        | 0.2984 | 255  | 1.1743          | 13965256          |
| 0.222         | 0.3042 | 260  | 1.1832          | 14243792          |
| 0.244         | 0.3101 | 265  | 1.1692          | 14524248          |
| 0.3439        | 0.3159 | 270  | 1.1722          | 14805296          |
| 0.2316        | 0.3218 | 275  | 1.1698          | 15078480          |
| 0.2024        | 0.3276 | 280  | 1.1734          | 15353592          |
| 0.2288        | 0.3335 | 285  | 1.1696          | 15628632          |
| 0.2868        | 0.3393 | 290  | 1.1661          | 15902808          |
| 0.3403        | 0.3452 | 295  | 1.1693          | 16179792          |
| 0.3238        | 0.3510 | 300  | 1.1663          | 16456880          |
| 0.236         | 0.3569 | 305  | 1.1625          | 16734104          |
| 0.1991        | 0.3627 | 310  | 1.1644          | 17008776          |
| 0.1729        | 0.3686 | 315  | 1.1646          | 17278024          |
| 0.2047        | 0.3744 | 320  | 1.1630          | 17550640          |
| 0.2911        | 0.3803 | 325  | 1.1582          | 17826960          |
| 0.1639        | 0.3861 | 330  | 1.1703          | 18098336          |
| 0.1956        | 0.3920 | 335  | 1.1660          | 18370416          |
| 0.2335        | 0.3978 | 340  | 1.1550          | 18640240          |
| 0.3123        | 0.4037 | 345  | 1.1614          | 18915192          |
| 0.2137        | 0.4095 | 350  | 1.1581          | 19182344          |
| 0.2683        | 0.4154 | 355  | 1.1541          | 19465576          |
| 0.2263        | 0.4212 | 360  | 1.1560          | 19743312          |
| 0.1861        | 0.4271 | 365  | 1.1590          | 20020896          |
| 0.2883        | 0.4329 | 370  | 1.1546          | 20294232          |
| 0.1755        | 0.4388 | 375  | 1.1525          | 20559040          |
| 0.213         | 0.4446 | 380  | 1.1534          | 20822032          |
| 0.1859        | 0.4505 | 385  | 1.1523          | 21099560          |
| 0.2529        | 0.4563 | 390  | 1.1537          | 21368144          |
| 0.242         | 0.4622 | 395  | 1.1498          | 21645832          |
| 0.1993        | 0.4680 | 400  | 1.1491          | 21924544          |
| 0.1637        | 0.4739 | 405  | 1.1509          | 22199720          |
| 0.1812        | 0.4797 | 410  | 1.1441          | 22477384          |
| 0.2141        | 0.4856 | 415  | 1.1454          | 22750888          |
| 0.2874        | 0.4914 | 420  | 1.1489          | 23027632          |
| 0.1906        | 0.4973 | 425  | 1.1413          | 23308144          |
| 0.2803        | 0.5031 | 430  | 1.1433          | 23580088          |
| 0.2174        | 0.5090 | 435  | 1.1437          | 23854088          |
| 0.2305        | 0.5148 | 440  | 1.1424          | 24134544          |
| 0.2014        | 0.5207 | 445  | 1.1465          | 24403536          |
| 0.2768        | 0.5265 | 450  | 1.1414          | 24680664          |
| 0.214         | 0.5324 | 455  | 1.1408          | 24952280          |
| 0.3169        | 0.5382 | 460  | 1.1445          | 25231192          |
| 0.2731        | 0.5441 | 465  | 1.1393          | 25505768          |
| 0.2496        | 0.5499 | 470  | 1.1391          | 25785544          |
| 0.2666        | 0.5558 | 475  | 1.1404          | 26056328          |
| 0.1958        | 0.5616 | 480  | 1.1394          | 26331200          |
| 0.1935        | 0.5675 | 485  | 1.1375          | 26610448          |
| 0.1744        | 0.5734 | 490  | 1.1368          | 26883696          |
| 0.2562        | 0.5792 | 495  | 1.1336          | 27155344          |
| 0.218         | 0.5851 | 500  | 1.1342          | 27427808          |
| 0.2348        | 0.5909 | 505  | 1.1335          | 27705544          |
| 0.2619        | 0.5968 | 510  | 1.1323          | 27974816          |
| 0.1454        | 0.6026 | 515  | 1.1351          | 28241360          |
| 0.2899        | 0.6085 | 520  | 1.1348          | 28513256          |
| 0.28          | 0.6143 | 525  | 1.1300          | 28781072          |
| 0.2314        | 0.6202 | 530  | 1.1314          | 29051688          |
| 0.1742        | 0.6260 | 535  | 1.1375          | 29322136          |
| 0.2316        | 0.6319 | 540  | 1.1320          | 29591728          |
| 0.197         | 0.6377 | 545  | 1.1289          | 29865856          |
| 0.2103        | 0.6436 | 550  | 1.1322          | 30139496          |
| 0.2218        | 0.6494 | 555  | 1.1290          | 30416656          |
| 0.205         | 0.6553 | 560  | 1.1265          | 30696792          |
| 0.1418        | 0.6611 | 565  | 1.1287          | 30971528          |
| 0.2414        | 0.6670 | 570  | 1.1276          | 31244968          |
| 0.2306        | 0.6728 | 575  | 1.1258          | 31520232          |
| 0.2341        | 0.6787 | 580  | 1.1275          | 31795864          |
| 0.2402        | 0.6845 | 585  | 1.1262          | 32069624          |
| 0.2602        | 0.6904 | 590  | 1.1263          | 32337864          |
| 0.2421        | 0.6962 | 595  | 1.1266          | 32618672          |
| 0.1608        | 0.7021 | 600  | 1.1260          | 32898536          |
| 0.266         | 0.7079 | 605  | 1.1234          | 33168224          |
| 0.1589        | 0.7138 | 610  | 1.1262          | 33433136          |
| 0.1982        | 0.7196 | 615  | 1.1257          | 33712384          |
| 0.1458        | 0.7255 | 620  | 1.1258          | 33981912          |
| 0.2513        | 0.7313 | 625  | 1.1299          | 34249392          |
| 0.1416        | 0.7372 | 630  | 1.1239          | 34521488          |
| 0.2103        | 0.7430 | 635  | 1.1246          | 34794184          |
| 0.2409        | 0.7489 | 640  | 1.1256          | 35068416          |
| 0.2248        | 0.7547 | 645  | 1.1218          | 35341160          |
| 0.2517        | 0.7606 | 650  | 1.1225          | 35618656          |
| 0.2098        | 0.7664 | 655  | 1.1215          | 35892176          |
| 0.2069        | 0.7723 | 660  | 1.1203          | 36174472          |
| 0.1857        | 0.7781 | 665  | 1.1229          | 36439872          |
| 0.2552        | 0.7840 | 670  | 1.1202          | 36714872          |
| 0.1902        | 0.7898 | 675  | 1.1188          | 36987872          |
| 0.2204        | 0.7957 | 680  | 1.1201          | 37263224          |
| 0.3015        | 0.8015 | 685  | 1.1189          | 37536992          |
| 0.2118        | 0.8074 | 690  | 1.1192          | 37793976          |
| 0.2303        | 0.8132 | 695  | 1.1178          | 38068432          |
| 0.2148        | 0.8191 | 700  | 1.1194          | 38341616          |
| 0.2132        | 0.8249 | 705  | 1.1185          | 38610776          |
| 0.1463        | 0.8308 | 710  | 1.1194          | 38888584          |
| 0.1878        | 0.8366 | 715  | 1.1210          | 39160392          |
| 0.275         | 0.8425 | 720  | 1.1178          | 39426336          |
| 0.1686        | 0.8483 | 725  | 1.1164          | 39698280          |
| 0.1518        | 0.8542 | 730  | 1.1198          | 39967168          |
| 0.2153        | 0.8600 | 735  | 1.1186          | 40242904          |
| 0.22          | 0.8659 | 740  | 1.1163          | 40515024          |
| 0.2084        | 0.8717 | 745  | 1.1172          | 40786080          |
| 0.264         | 0.8776 | 750  | 1.1143          | 41059704          |
| 0.1918        | 0.8834 | 755  | 1.1147          | 41331008          |
| 0.2444        | 0.8893 | 760  | 1.1154          | 41603928          |
| 0.1433        | 0.8951 | 765  | 1.1158          | 41873784          |
| 0.2206        | 0.9010 | 770  | 1.1152          | 42140496          |
| 0.204         | 0.9068 | 775  | 1.1131          | 42415368          |
| 0.1427        | 0.9127 | 780  | 1.1143          | 42697792          |
| 0.2541        | 0.9185 | 785  | 1.1149          | 42976216          |
| 0.2033        | 0.9244 | 790  | 1.1160          | 43250816          |
| 0.1249        | 0.9302 | 795  | 1.1139          | 43531552          |
| 0.158         | 0.9361 | 800  | 1.1146          | 43811968          |
| 0.1552        | 0.9419 | 805  | 1.1154          | 44080032          |
| 0.1523        | 0.9478 | 810  | 1.1141          | 44351688          |
| 0.1709        | 0.9536 | 815  | 1.1129          | 44631208          |
| 0.133         | 0.9595 | 820  | 1.1129          | 44900816          |
| 0.2698        | 0.9653 | 825  | 1.1133          | 45173960          |
| 0.1856        | 0.9712 | 830  | 1.1131          | 45444920          |
| 0.2218        | 0.9770 | 835  | 1.1141          | 45715920          |
| 0.1803        | 0.9829 | 840  | 1.1164          | 45990080          |
| 0.2412        | 0.9887 | 845  | 1.1138          | 46264088          |
| 0.3314        | 0.9946 | 850  | 1.1103          | 46540904          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1