amazingvince's picture
Model save
c9f83b4 verified
|
raw
history blame
2.39 kB
metadata
library_name: transformers
license: apache-2.0
base_model: BEE-spoke-data/tFINE-900m-e16-d32-flan
tags:
  - generated_from_trainer
model-index:
  - name: tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024
    results: []

tFINE-900m-e16-d32-flan-infinity-instruct-7m-T2T_en-1024

This model is a fine-tuned version of BEE-spoke-data/tFINE-900m-e16-d32-flan on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3589
  • Num Input Tokens Seen: 785148304

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 17868
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.6487 0.0969 2000 1.7665 78885660
1.4957 0.1938 4000 1.6085 157778628
1.4224 0.2907 6000 1.5239 236103764
1.3764 0.3877 8000 1.4715 314442716
1.3553 0.4846 10000 1.4268 392909044
1.3308 0.5815 12000 1.4009 471314876
1.2622 0.6784 14000 1.3831 550234352
1.2585 0.7753 16000 1.3684 628560668
1.2477 0.8722 18000 1.3608 707047904
1.216 0.9691 20000 1.3589 785148304

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1