amazingvince's picture
End of training
c7b5ca4 verified
|
raw
history blame
2.6 kB
metadata
library_name: transformers
language:
  - en
license: apache-2.0
base_model: BEE-spoke-data/tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024
tags:
  - generated_from_trainer
model-index:
  - name: >-
      tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024-infinity-instruct-7m-T2T_en-1024-v2
    results: []

tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024-infinity-instruct-7m-T2T_en-1024-v2

This model is a fine-tuned version of BEE-spoke-data/tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024 on the pszemraj/infinity-instruct-7m-T2T_en dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1159
  • Num Input Tokens Seen: 810839096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 6969
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.234 0.0969 2000 1.2439 78067836
1.2248 0.1938 4000 1.2256 156868756
1.2024 0.2907 6000 1.2009 235148092
1.2074 0.3876 8000 1.1777 313452856
1.1617 0.4845 10000 1.1597 392316428
1.1755 0.5815 12000 1.1437 471101508
1.1473 0.6784 14000 1.1321 549831184
1.1743 0.7753 16000 1.1244 628937800
1.137 0.8722 18000 1.1179 707117360
1.0713 0.9691 20000 1.1160 785755388

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1