|
--- |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: roberta-tiny-10M |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# roberta-tiny-10M |
|
|
|
This model was trained from scratch on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 2.7391 |
|
- Accuracy: 0.5148 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0004 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 32 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 32 |
|
- total_train_batch_size: 512 |
|
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 50 |
|
- num_epochs: 100.0 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------:| |
|
| 7.8031 | 1.04 | 50 | 7.3560 | 0.0606 | |
|
| 7.1948 | 2.08 | 100 | 6.7374 | 0.1182 | |
|
| 6.8927 | 3.12 | 150 | 6.5022 | 0.1415 | |
|
| 6.7339 | 4.16 | 200 | 6.4005 | 0.1483 | |
|
| 6.6609 | 5.21 | 250 | 6.3535 | 0.1510 | |
|
| 6.1972 | 6.25 | 300 | 6.3324 | 0.1519 | |
|
| 6.1685 | 7.29 | 350 | 6.3029 | 0.1528 | |
|
| 6.1302 | 8.33 | 400 | 6.2828 | 0.1521 | |
|
| 6.093 | 9.37 | 450 | 6.2568 | 0.1536 | |
|
| 6.0543 | 10.41 | 500 | 6.2430 | 0.1544 | |
|
| 6.0479 | 11.45 | 550 | 6.2346 | 0.1541 | |
|
| 6.0372 | 12.49 | 600 | 6.2232 | 0.1546 | |
|
| 6.0127 | 13.53 | 650 | 6.2139 | 0.1541 | |
|
| 5.968 | 14.58 | 700 | 6.2053 | 0.1547 | |
|
| 5.9635 | 15.62 | 750 | 6.1996 | 0.1549 | |
|
| 5.9479 | 16.66 | 800 | 6.1953 | 0.1548 | |
|
| 5.9371 | 17.7 | 850 | 6.1887 | 0.1545 | |
|
| 5.9046 | 18.74 | 900 | 6.1613 | 0.1545 | |
|
| 5.8368 | 19.78 | 950 | 6.0952 | 0.1557 | |
|
| 5.7914 | 20.82 | 1000 | 6.0330 | 0.1569 | |
|
| 5.7026 | 21.86 | 1050 | 5.9430 | 0.1612 | |
|
| 5.491 | 22.9 | 1100 | 5.6100 | 0.1974 | |
|
| 4.9289 | 23.95 | 1150 | 4.9607 | 0.2702 | |
|
| 4.5214 | 24.99 | 1200 | 4.5795 | 0.3051 | |
|
| 4.5663 | 26.04 | 1250 | 4.3454 | 0.3265 | |
|
| 4.3717 | 27.08 | 1300 | 4.1738 | 0.3412 | |
|
| 4.1483 | 28.12 | 1350 | 4.0336 | 0.3555 | |
|
| 3.9988 | 29.16 | 1400 | 3.9180 | 0.3677 | |
|
| 3.8695 | 30.21 | 1450 | 3.8108 | 0.3782 | |
|
| 3.5017 | 31.25 | 1500 | 3.7240 | 0.3879 | |
|
| 3.4311 | 32.29 | 1550 | 3.6426 | 0.3974 | |
|
| 3.3517 | 33.33 | 1600 | 3.5615 | 0.4068 | |
|
| 3.2856 | 34.37 | 1650 | 3.4915 | 0.4156 | |
|
| 3.227 | 35.41 | 1700 | 3.4179 | 0.4255 | |
|
| 3.1675 | 36.45 | 1750 | 3.3636 | 0.4325 | |
|
| 3.0908 | 37.49 | 1800 | 3.3083 | 0.4394 | |
|
| 3.0561 | 38.53 | 1850 | 3.2572 | 0.4473 | |
|
| 3.0139 | 39.58 | 1900 | 3.2159 | 0.4525 | |
|
| 2.9837 | 40.62 | 1950 | 3.1789 | 0.4575 | |
|
| 2.9387 | 41.66 | 2000 | 3.1431 | 0.4618 | |
|
| 2.9034 | 42.7 | 2050 | 3.1163 | 0.4654 | |
|
| 2.8822 | 43.74 | 2100 | 3.0842 | 0.4694 | |
|
| 2.836 | 44.78 | 2150 | 3.0583 | 0.4727 | |
|
| 2.8129 | 45.82 | 2200 | 3.0359 | 0.4760 | |
|
| 2.7733 | 46.86 | 2250 | 3.0173 | 0.4776 | |
|
| 2.7589 | 47.9 | 2300 | 2.9978 | 0.4812 | |
|
| 2.7378 | 48.95 | 2350 | 2.9788 | 0.4831 | |
|
| 2.7138 | 49.99 | 2400 | 2.9674 | 0.4844 | |
|
| 2.8692 | 51.04 | 2450 | 2.9476 | 0.4874 | |
|
| 2.8462 | 52.08 | 2500 | 2.9342 | 0.4893 | |
|
| 2.8312 | 53.12 | 2550 | 2.9269 | 0.4900 | |
|
| 2.7834 | 54.16 | 2600 | 2.9111 | 0.4917 | |
|
| 2.7822 | 55.21 | 2650 | 2.8987 | 0.4934 | |
|
| 2.584 | 56.25 | 2700 | 2.8844 | 0.4949 | |
|
| 2.5668 | 57.29 | 2750 | 2.8808 | 0.4965 | |
|
| 2.5536 | 58.33 | 2800 | 2.8640 | 0.4982 | |
|
| 2.5403 | 59.37 | 2850 | 2.8606 | 0.4982 | |
|
| 2.5294 | 60.41 | 2900 | 2.8441 | 0.5008 | |
|
| 2.513 | 61.45 | 2950 | 2.8402 | 0.5013 | |
|
| 2.5105 | 62.49 | 3000 | 2.8316 | 0.5022 | |
|
| 2.4897 | 63.53 | 3050 | 2.8237 | 0.5027 | |
|
| 2.4974 | 64.58 | 3100 | 2.8187 | 0.5040 | |
|
| 2.4799 | 65.62 | 3150 | 2.8129 | 0.5044 | |
|
| 2.4741 | 66.66 | 3200 | 2.8056 | 0.5057 | |
|
| 2.4582 | 67.7 | 3250 | 2.8025 | 0.5061 | |
|
| 2.4389 | 68.74 | 3300 | 2.7913 | 0.5076 | |
|
| 2.4539 | 69.78 | 3350 | 2.7881 | 0.5072 | |
|
| 2.4252 | 70.82 | 3400 | 2.7884 | 0.5082 | |
|
| 2.4287 | 71.86 | 3450 | 2.7784 | 0.5093 | |
|
| 2.4131 | 72.9 | 3500 | 2.7782 | 0.5099 | |
|
| 2.4016 | 73.95 | 3550 | 2.7724 | 0.5098 | |
|
| 2.3998 | 74.99 | 3600 | 2.7659 | 0.5111 | |
|
| 2.5475 | 76.04 | 3650 | 2.7650 | 0.5108 | |
|
| 2.5443 | 77.08 | 3700 | 2.7620 | 0.5117 | |
|
| 2.5381 | 78.12 | 3750 | 2.7631 | 0.5115 | |
|
| 2.5269 | 79.16 | 3800 | 2.7578 | 0.5122 | |
|
| 2.5288 | 80.21 | 3850 | 2.7540 | 0.5124 | |
|
| 2.3669 | 81.25 | 3900 | 2.7529 | 0.5125 | |
|
| 2.3631 | 82.29 | 3950 | 2.7498 | 0.5132 | |
|
| 2.3499 | 83.33 | 4000 | 2.7454 | 0.5136 | |
|
| 2.3726 | 84.37 | 4050 | 2.7446 | 0.5141 | |
|
| 2.3411 | 85.41 | 4100 | 2.7403 | 0.5144 | |
|
| 2.3321 | 86.45 | 4150 | 2.7372 | 0.5146 | |
|
| 2.3456 | 87.49 | 4200 | 2.7389 | 0.5146 | |
|
| 2.3372 | 88.53 | 4250 | 2.7384 | 0.5151 | |
|
| 2.343 | 89.58 | 4300 | 2.7398 | 0.5144 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.24.0 |
|
- Pytorch 1.11.0+cu113 |
|
- Datasets 2.6.1 |
|
- Tokenizers 0.12.1 |
|
|