pszemraj's picture
Update README.md
4c62613 verified
metadata
library_name: transformers
language:
  - en
license: apache-2.0
base_model: BEE-spoke-data/tFINE-900m-e16-d32-instruct
tags:
  - generated_from_trainer
datasets:
  - pszemraj/infinity-instruct-7m-T2T_en

BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e

second epoch of fine-tuning on the same dataset w/ different seed

This model is a fine-tuned version of BEE-spoke-data/tFINE-900m-e16-d32-instruct on the pszemraj/infinity-instruct-7m-T2T_en dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1159
  • Num Input Tokens Seen: 810839096

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 6969
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.234 0.0969 2000 1.2439 78067836
1.2248 0.1938 4000 1.2256 156868756
1.2024 0.2907 6000 1.2009 235148092
1.2074 0.3876 8000 1.1777 313452856
1.1617 0.4845 10000 1.1597 392316428
1.1755 0.5815 12000 1.1437 471101508
1.1473 0.6784 14000 1.1321 549831184
1.1743 0.7753 16000 1.1244 628937800
1.137 0.8722 18000 1.1179 707117360
1.0713 0.9691 20000 1.1160 785755388