Edit model card

Baby-Llama-58M-ORIGINAL

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.1715

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 80
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
289.4006 1.0 12 243.1602
229.6677 2.0 24 201.2611
207.9305 3.0 36 172.3865
125.9577 4.0 48 107.6148
84.8722 5.0 60 72.6301
48.2019 6.0 72 40.8081
25.796 7.0 84 22.0368
15.6686 8.0 96 13.6649
9.8745 9.0 108 9.6135
7.9539 10.0 120 8.0438
6.4873 11.0 132 7.1011
6.0896 12.0 144 6.4537
5.4521 13.0 156 6.0605
5.5516 14.0 168 6.0324
4.7538 15.0 180 5.7866
4.8229 16.0 192 5.5738
4.568 17.0 204 5.5282
4.4449 18.0 216 5.4060
4.6567 19.0 228 5.3382
4.1888 20.0 240 5.2407
4.2102 21.0 252 5.2085
4.6584 22.0 264 5.0947
4.102 23.0 276 4.9988
4.3574 24.0 288 4.9768
4.0571 25.0 300 4.9552
4.22 26.0 312 4.9127
3.9908 27.0 324 4.9050
4.0273 28.0 336 4.7905
4.0092 29.0 348 4.8265
3.9705 30.0 360 4.7823
4.0081 31.0 372 4.7383
3.8771 32.0 384 4.6774
3.899 33.0 396 4.6629
3.4711 34.0 408 4.6603
3.4489 35.0 420 4.5675
3.5063 36.0 432 4.5751
3.6348 37.0 444 4.5786
3.6931 38.0 456 4.5513
3.7022 39.0 468 4.5208
3.6842 40.0 480 4.5146
3.4084 41.0 492 4.5171
3.5141 42.0 504 4.4681
3.2337 43.0 516 4.4700
3.4376 44.0 528 4.4472
3.2911 45.0 540 4.4462
3.6011 46.0 552 4.4115
3.8547 47.0 564 4.3901
3.3866 48.0 576 4.3873
3.4543 49.0 588 4.3904
3.6357 50.0 600 4.3693
3.5045 51.0 612 4.3569
3.0792 52.0 624 4.3263
3.2731 53.0 636 4.3322
3.4193 54.0 648 4.3012
3.1097 55.0 660 4.3015
3.088 56.0 672 4.2914
2.9444 57.0 684 4.2750
3.362 58.0 696 4.2612
3.2228 59.0 708 4.2647
2.9892 60.0 720 4.2417
3.0214 61.0 732 4.2287
3.3049 62.0 744 4.2328
3.4639 63.0 756 4.2200
3.2505 64.0 768 4.2130
3.0121 65.0 780 4.2087
3.3112 66.0 792 4.2001
3.3258 67.0 804 4.2013
2.9143 68.0 816 4.1952
3.1404 69.0 828 4.1876
3.495 70.0 840 4.1910
3.134 71.0 852 4.1841
3.1945 72.0 864 4.1835
3.1116 73.0 876 4.1742
3.2141 74.0 888 4.1743
3.3962 75.0 900 4.1734
2.8472 76.0 912 4.1721
3.2455 77.0 924 4.1717
2.9047 78.0 936 4.1724
3.4182 79.0 948 4.1716
3.5737 80.0 960 4.1715

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
9
Safetensors
Model size
46.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.