roberta-tiny-2l-10M

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1695
  • Accuracy: 0.4534

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0004
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 512
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 100.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
7.7619 1.04 50 7.2338 0.0748
7.0524 2.08 100 6.6252 0.1331
6.8423 3.12 150 6.4622 0.1463
6.7298 4.16 200 6.3971 0.1488
6.669 5.21 250 6.3628 0.1519
6.2038 6.25 300 6.3371 0.1518
6.1783 7.29 350 6.3115 0.1532
6.1459 8.33 400 6.2922 0.1530
6.1096 9.37 450 6.2696 0.1536
6.0745 10.41 500 6.2545 0.1541
6.0689 11.45 550 6.2496 0.1533
6.0562 12.49 600 6.2313 0.1542
6.0324 13.53 650 6.2248 0.1536
5.9907 14.58 700 6.2179 0.1544
5.9683 15.62 750 6.1832 0.1545
5.9236 16.66 800 6.1413 0.1550
5.8808 17.7 850 6.0900 0.1558
5.8392 18.74 900 6.0543 0.1566
5.7962 19.78 950 6.0222 0.1575
5.7473 20.82 1000 5.9471 0.1617
5.5787 21.86 1050 5.7038 0.1891
5.2316 22.9 1100 5.2708 0.2382
4.6613 23.95 1150 4.7075 0.2975
4.3006 24.99 1200 4.4180 0.3222
4.3754 26.04 1250 4.2383 0.3385
4.2531 27.08 1300 4.1157 0.3491
4.0987 28.12 1350 4.0197 0.3578
4.0045 29.16 1400 3.9504 0.3656
3.9145 30.21 1450 3.8819 0.3718
3.5808 31.25 1500 3.8279 0.3781
3.5354 32.29 1550 3.7830 0.3826
3.4788 33.33 1600 3.7400 0.3872
3.4315 34.37 1650 3.7028 0.3911
3.3906 35.41 1700 3.6629 0.3956
3.3508 36.45 1750 3.6344 0.3984
3.288 37.49 1800 3.6046 0.4019
3.2678 38.53 1850 3.5799 0.4053
3.2382 39.58 1900 3.5549 0.4074
3.2151 40.62 1950 3.5285 0.4103
3.1777 41.66 2000 3.5069 0.4132
3.1499 42.7 2050 3.4917 0.4150
3.131 43.74 2100 3.4701 0.4168
3.0942 44.78 2150 3.4530 0.4189
3.0683 45.82 2200 3.4320 0.4212
3.0363 46.86 2250 3.4195 0.4227
3.0264 47.9 2300 3.4046 0.4249
3.0079 48.95 2350 3.3874 0.4267
2.9869 49.99 2400 3.3792 0.4277
3.1592 51.04 2450 3.3655 0.4289
3.1353 52.08 2500 3.3548 0.4310
3.1257 53.12 2550 3.3489 0.4308
3.0822 54.16 2600 3.3353 0.4327
3.0771 55.21 2650 3.3220 0.4341
2.8639 56.25 2700 3.3119 0.4354
2.8477 57.29 2750 3.3104 0.4360
2.8373 58.33 2800 3.2954 0.4378
2.818 59.37 2850 3.2935 0.4381
2.8137 60.41 2900 3.2786 0.4394
2.7985 61.45 2950 3.2747 0.4401
2.7936 62.49 3000 3.2668 0.4411
2.7764 63.53 3050 3.2569 0.4419
2.7819 64.58 3100 3.2492 0.4434
2.7672 65.62 3150 3.2494 0.4433
2.7629 66.66 3200 3.2410 0.4443
2.747 67.7 3250 3.2368 0.4446
2.7303 68.74 3300 3.2246 0.4460
2.7461 69.78 3350 3.2212 0.4462
2.7179 70.82 3400 3.2217 0.4470
2.7184 71.86 3450 3.2132 0.4479
2.7077 72.9 3500 3.2086 0.4487
2.6916 73.95 3550 3.2057 0.4482
2.6934 74.99 3600 3.2010 0.4495
2.8585 76.04 3650 3.1980 0.4497
2.8559 77.08 3700 3.1940 0.4503
2.8519 78.12 3750 3.1940 0.4506
2.8391 79.16 3800 3.1897 0.4509
2.845 80.21 3850 3.1858 0.4510
2.6636 81.25 3900 3.1819 0.4518
2.6569 82.29 3950 3.1834 0.4517
2.647 83.33 4000 3.1798 0.4517
2.6665 84.37 4050 3.1786 0.4525
2.6382 85.41 4100 3.1733 0.4525
2.6346 86.45 4150 3.1700 0.4532
2.6457 87.49 4200 3.1714 0.4529
2.6328 88.53 4250 3.1686 0.4537
2.6429 89.58 4300 3.1715 0.4534
2.6369 90.62 4350 3.1687 0.4538
2.628 91.66 4400 3.1651 0.4539
2.6373 92.7 4450 3.1660 0.4539
2.6357 93.74 4500 3.1662 0.4537
2.6302 94.78 4550 3.1695 0.4533

Framework versions

  • Transformers 4.24.0
  • Pytorch 1.11.0+cu113
  • Datasets 2.6.1
  • Tokenizers 0.12.1
Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.