metadata
tags:
- generated_from_trainer
datasets:
- kanishka/counterfactual_babylm_aann_high_variability_noun
metrics:
- accuracy
model-index:
- name: >-
smolm-autoreg-bpe-counterfactual_babylm_aann_high_variability_noun-seed_1024-1e-3
results:
- task:
name: Causal Language Modeling
type: text-generation
dataset:
name: kanishka/counterfactual_babylm_aann_high_variability_noun
type: kanishka/counterfactual_babylm_aann_high_variability_noun
metrics:
- name: Accuracy
type: accuracy
value: 0.41050110032238096
smolm-autoreg-bpe-counterfactual_babylm_aann_high_variability_noun-seed_1024-1e-3
This model was trained from scratch on the kanishka/counterfactual_babylm_aann_high_variability_noun dataset. It achieves the following results on the evaluation set:
- Loss: 3.4042
- Accuracy: 0.4105
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 1024
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
3.5971 | 1.0 | 18594 | 3.7361 | 0.3618 |
3.3831 | 2.0 | 37188 | 3.5640 | 0.3810 |
3.2562 | 3.0 | 55782 | 3.4518 | 0.3928 |
3.1823 | 4.0 | 74376 | 3.4151 | 0.3979 |
3.123 | 5.0 | 92970 | 3.3895 | 0.4020 |
3.0779 | 6.0 | 111564 | 3.3728 | 0.4045 |
3.0394 | 7.0 | 130158 | 3.3513 | 0.4068 |
3.0098 | 8.0 | 148752 | 3.3329 | 0.4091 |
2.9848 | 9.0 | 167346 | 3.3543 | 0.4096 |
2.9607 | 10.0 | 185940 | 3.3317 | 0.4102 |
2.9381 | 11.0 | 204534 | 3.3509 | 0.4096 |
2.9132 | 12.0 | 223128 | 3.3383 | 0.4106 |
2.8884 | 13.0 | 241722 | 3.3772 | 0.4105 |
2.8698 | 14.0 | 260316 | 3.3457 | 0.4117 |
2.8512 | 15.0 | 278910 | 3.3592 | 0.4110 |
2.828 | 16.0 | 297504 | 3.3782 | 0.4106 |
2.813 | 17.0 | 316098 | 3.3778 | 0.4109 |
2.7978 | 18.0 | 334692 | 3.3931 | 0.4105 |
2.7756 | 19.0 | 353286 | 3.3947 | 0.4107 |
2.7571 | 20.0 | 371880 | 3.4042 | 0.4105 |
Framework versions
- Transformers 4.41.0
- Pytorch 2.2.0+cu121
- Datasets 2.16.1
- Tokenizers 0.19.1