update model card README.md
Browse files
README.md
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- generated_from_trainer
|
4 |
+
metrics:
|
5 |
+
- accuracy
|
6 |
+
model-index:
|
7 |
+
- name: roberta-tiny-2l-10M
|
8 |
+
results: []
|
9 |
+
---
|
10 |
+
|
11 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
+
should probably proofread and complete it, then remove this comment. -->
|
13 |
+
|
14 |
+
# roberta-tiny-2l-10M
|
15 |
+
|
16 |
+
This model was trained from scratch on an unknown dataset.
|
17 |
+
It achieves the following results on the evaluation set:
|
18 |
+
- Loss: 3.1695
|
19 |
+
- Accuracy: 0.4533
|
20 |
+
|
21 |
+
## Model description
|
22 |
+
|
23 |
+
More information needed
|
24 |
+
|
25 |
+
## Intended uses & limitations
|
26 |
+
|
27 |
+
More information needed
|
28 |
+
|
29 |
+
## Training and evaluation data
|
30 |
+
|
31 |
+
More information needed
|
32 |
+
|
33 |
+
## Training procedure
|
34 |
+
|
35 |
+
### Training hyperparameters
|
36 |
+
|
37 |
+
The following hyperparameters were used during training:
|
38 |
+
- learning_rate: 0.0004
|
39 |
+
- train_batch_size: 16
|
40 |
+
- eval_batch_size: 32
|
41 |
+
- seed: 42
|
42 |
+
- gradient_accumulation_steps: 32
|
43 |
+
- total_train_batch_size: 512
|
44 |
+
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
|
45 |
+
- lr_scheduler_type: cosine
|
46 |
+
- lr_scheduler_warmup_steps: 50
|
47 |
+
- num_epochs: 100.0
|
48 |
+
- mixed_precision_training: Native AMP
|
49 |
+
|
50 |
+
### Training results
|
51 |
+
|
52 |
+
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|
53 |
+
|:-------------:|:-----:|:----:|:---------------:|:--------:|
|
54 |
+
| 7.7619 | 1.04 | 50 | 7.2338 | 0.0748 |
|
55 |
+
| 7.0524 | 2.08 | 100 | 6.6252 | 0.1331 |
|
56 |
+
| 6.8423 | 3.12 | 150 | 6.4622 | 0.1463 |
|
57 |
+
| 6.7298 | 4.16 | 200 | 6.3971 | 0.1488 |
|
58 |
+
| 6.669 | 5.21 | 250 | 6.3628 | 0.1519 |
|
59 |
+
| 6.2038 | 6.25 | 300 | 6.3371 | 0.1518 |
|
60 |
+
| 6.1783 | 7.29 | 350 | 6.3115 | 0.1532 |
|
61 |
+
| 6.1459 | 8.33 | 400 | 6.2922 | 0.1530 |
|
62 |
+
| 6.1096 | 9.37 | 450 | 6.2696 | 0.1536 |
|
63 |
+
| 6.0745 | 10.41 | 500 | 6.2545 | 0.1541 |
|
64 |
+
| 6.0689 | 11.45 | 550 | 6.2496 | 0.1533 |
|
65 |
+
| 6.0562 | 12.49 | 600 | 6.2313 | 0.1542 |
|
66 |
+
| 6.0324 | 13.53 | 650 | 6.2248 | 0.1536 |
|
67 |
+
| 5.9907 | 14.58 | 700 | 6.2179 | 0.1544 |
|
68 |
+
| 5.9683 | 15.62 | 750 | 6.1832 | 0.1545 |
|
69 |
+
| 5.9236 | 16.66 | 800 | 6.1413 | 0.1550 |
|
70 |
+
| 5.8808 | 17.7 | 850 | 6.0900 | 0.1558 |
|
71 |
+
| 5.8392 | 18.74 | 900 | 6.0543 | 0.1566 |
|
72 |
+
| 5.7962 | 19.78 | 950 | 6.0222 | 0.1575 |
|
73 |
+
| 5.7473 | 20.82 | 1000 | 5.9471 | 0.1617 |
|
74 |
+
| 5.5787 | 21.86 | 1050 | 5.7038 | 0.1891 |
|
75 |
+
| 5.2316 | 22.9 | 1100 | 5.2708 | 0.2382 |
|
76 |
+
| 4.6613 | 23.95 | 1150 | 4.7075 | 0.2975 |
|
77 |
+
| 4.3006 | 24.99 | 1200 | 4.4180 | 0.3222 |
|
78 |
+
| 4.3754 | 26.04 | 1250 | 4.2383 | 0.3385 |
|
79 |
+
| 4.2531 | 27.08 | 1300 | 4.1157 | 0.3491 |
|
80 |
+
| 4.0987 | 28.12 | 1350 | 4.0197 | 0.3578 |
|
81 |
+
| 4.0045 | 29.16 | 1400 | 3.9504 | 0.3656 |
|
82 |
+
| 3.9145 | 30.21 | 1450 | 3.8819 | 0.3718 |
|
83 |
+
| 3.5808 | 31.25 | 1500 | 3.8279 | 0.3781 |
|
84 |
+
| 3.5354 | 32.29 | 1550 | 3.7830 | 0.3826 |
|
85 |
+
| 3.4788 | 33.33 | 1600 | 3.7400 | 0.3872 |
|
86 |
+
| 3.4315 | 34.37 | 1650 | 3.7028 | 0.3911 |
|
87 |
+
| 3.3906 | 35.41 | 1700 | 3.6629 | 0.3956 |
|
88 |
+
| 3.3508 | 36.45 | 1750 | 3.6344 | 0.3984 |
|
89 |
+
| 3.288 | 37.49 | 1800 | 3.6046 | 0.4019 |
|
90 |
+
| 3.2678 | 38.53 | 1850 | 3.5799 | 0.4053 |
|
91 |
+
| 3.2382 | 39.58 | 1900 | 3.5549 | 0.4074 |
|
92 |
+
| 3.2151 | 40.62 | 1950 | 3.5285 | 0.4103 |
|
93 |
+
| 3.1777 | 41.66 | 2000 | 3.5069 | 0.4132 |
|
94 |
+
| 3.1499 | 42.7 | 2050 | 3.4917 | 0.4150 |
|
95 |
+
| 3.131 | 43.74 | 2100 | 3.4701 | 0.4168 |
|
96 |
+
| 3.0942 | 44.78 | 2150 | 3.4530 | 0.4189 |
|
97 |
+
| 3.0683 | 45.82 | 2200 | 3.4320 | 0.4212 |
|
98 |
+
| 3.0363 | 46.86 | 2250 | 3.4195 | 0.4227 |
|
99 |
+
| 3.0264 | 47.9 | 2300 | 3.4046 | 0.4249 |
|
100 |
+
| 3.0079 | 48.95 | 2350 | 3.3874 | 0.4267 |
|
101 |
+
| 2.9869 | 49.99 | 2400 | 3.3792 | 0.4277 |
|
102 |
+
| 3.1592 | 51.04 | 2450 | 3.3655 | 0.4289 |
|
103 |
+
| 3.1353 | 52.08 | 2500 | 3.3548 | 0.4310 |
|
104 |
+
| 3.1257 | 53.12 | 2550 | 3.3489 | 0.4308 |
|
105 |
+
| 3.0822 | 54.16 | 2600 | 3.3353 | 0.4327 |
|
106 |
+
| 3.0771 | 55.21 | 2650 | 3.3220 | 0.4341 |
|
107 |
+
| 2.8639 | 56.25 | 2700 | 3.3119 | 0.4354 |
|
108 |
+
| 2.8477 | 57.29 | 2750 | 3.3104 | 0.4360 |
|
109 |
+
| 2.8373 | 58.33 | 2800 | 3.2954 | 0.4378 |
|
110 |
+
| 2.818 | 59.37 | 2850 | 3.2935 | 0.4381 |
|
111 |
+
| 2.8137 | 60.41 | 2900 | 3.2786 | 0.4394 |
|
112 |
+
| 2.7985 | 61.45 | 2950 | 3.2747 | 0.4401 |
|
113 |
+
| 2.7936 | 62.49 | 3000 | 3.2668 | 0.4411 |
|
114 |
+
| 2.7764 | 63.53 | 3050 | 3.2569 | 0.4419 |
|
115 |
+
| 2.7819 | 64.58 | 3100 | 3.2492 | 0.4434 |
|
116 |
+
| 2.7672 | 65.62 | 3150 | 3.2494 | 0.4433 |
|
117 |
+
| 2.7629 | 66.66 | 3200 | 3.2410 | 0.4443 |
|
118 |
+
| 2.747 | 67.7 | 3250 | 3.2368 | 0.4446 |
|
119 |
+
| 2.7303 | 68.74 | 3300 | 3.2246 | 0.4460 |
|
120 |
+
| 2.7461 | 69.78 | 3350 | 3.2212 | 0.4462 |
|
121 |
+
| 2.7179 | 70.82 | 3400 | 3.2217 | 0.4470 |
|
122 |
+
| 2.7184 | 71.86 | 3450 | 3.2132 | 0.4479 |
|
123 |
+
| 2.7077 | 72.9 | 3500 | 3.2086 | 0.4487 |
|
124 |
+
| 2.6916 | 73.95 | 3550 | 3.2057 | 0.4482 |
|
125 |
+
| 2.6934 | 74.99 | 3600 | 3.2010 | 0.4495 |
|
126 |
+
| 2.8585 | 76.04 | 3650 | 3.1980 | 0.4497 |
|
127 |
+
| 2.8559 | 77.08 | 3700 | 3.1940 | 0.4503 |
|
128 |
+
| 2.8519 | 78.12 | 3750 | 3.1940 | 0.4506 |
|
129 |
+
| 2.8391 | 79.16 | 3800 | 3.1897 | 0.4509 |
|
130 |
+
| 2.845 | 80.21 | 3850 | 3.1858 | 0.4510 |
|
131 |
+
| 2.6636 | 81.25 | 3900 | 3.1819 | 0.4518 |
|
132 |
+
| 2.6569 | 82.29 | 3950 | 3.1834 | 0.4517 |
|
133 |
+
| 2.647 | 83.33 | 4000 | 3.1798 | 0.4517 |
|
134 |
+
| 2.6665 | 84.37 | 4050 | 3.1786 | 0.4525 |
|
135 |
+
| 2.6382 | 85.41 | 4100 | 3.1733 | 0.4525 |
|
136 |
+
| 2.6346 | 86.45 | 4150 | 3.1700 | 0.4532 |
|
137 |
+
| 2.6457 | 87.49 | 4200 | 3.1714 | 0.4529 |
|
138 |
+
| 2.6328 | 88.53 | 4250 | 3.1686 | 0.4537 |
|
139 |
+
| 2.6429 | 89.58 | 4300 | 3.1715 | 0.4534 |
|
140 |
+
| 2.6369 | 90.62 | 4350 | 3.1687 | 0.4538 |
|
141 |
+
| 2.628 | 91.66 | 4400 | 3.1651 | 0.4539 |
|
142 |
+
| 2.6373 | 92.7 | 4450 | 3.1660 | 0.4539 |
|
143 |
+
| 2.6357 | 93.74 | 4500 | 3.1662 | 0.4537 |
|
144 |
+
| 2.6302 | 94.78 | 4550 | 3.1695 | 0.4533 |
|
145 |
+
|
146 |
+
|
147 |
+
### Framework versions
|
148 |
+
|
149 |
+
- Transformers 4.24.0
|
150 |
+
- Pytorch 1.11.0+cu113
|
151 |
+
- Datasets 2.6.1
|
152 |
+
- Tokenizers 0.12.1
|