long_first_headfinal_seed-42_1e-3
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 5.1160
- Accuracy: 0.2007
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.1724 | 0.9994 | 1470 | 5.5103 | 0.1759 |
4.5289 | 1.9992 | 2940 | 5.4000 | 0.1844 |
3.8901 | 2.9991 | 4410 | 5.3044 | 0.1895 |
3.7154 | 3.9996 | 5881 | 5.2299 | 0.1952 |
3.4885 | 4.9994 | 7351 | 5.1806 | 0.1983 |
3.4097 | 5.9992 | 8821 | 5.1625 | 0.1984 |
3.3049 | 6.9991 | 10291 | 5.1184 | 0.1994 |
3.2579 | 7.9996 | 11762 | 5.1354 | 0.2021 |
3.2058 | 8.9994 | 13232 | 5.1414 | 0.2010 |
3.1678 | 9.9992 | 14702 | 5.1105 | 0.2010 |
3.143 | 10.9991 | 16172 | 5.0866 | 0.1999 |
3.1069 | 11.9996 | 17643 | 5.1130 | 0.2012 |
3.1019 | 12.9994 | 19113 | 5.1315 | 0.2012 |
3.0681 | 13.9992 | 20583 | 5.1160 | 0.2007 |
Framework versions
- Transformers 4.46.2
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.20.0
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.