Edit model card

sft

This model is a fine-tuned version of rinna/llama-3-youko-8b on the data_bricks, the kunishou, the ichikara-004-multi, the ichikara-004-single, the apto_instruct, the apto_dialogue, the oasst_ja and the megagon datasets. It achieves the following results on the evaluation set:

  • Loss: 1.0135

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 2.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.9155 0.1433 100 1.8934
1.7826 0.2865 200 1.7808
1.6897 0.4298 300 1.6719
1.5887 0.5731 400 1.5738
1.4628 0.7163 500 1.4660
1.3751 0.8596 600 1.3671
1.1263 1.0029 700 1.2831
0.688 1.1461 800 1.2492
0.6544 1.2894 900 1.1818
0.6017 1.4327 1000 1.1207
0.5763 1.5759 1100 1.0708
0.5599 1.7192 1200 1.0365
0.5101 1.8625 1300 1.0170

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
16
Safetensors
Model size
8.03B params
Tensor type
FP16
·

Finetuned from