Edit model card

Llama-31-8B_task-3_120-samples_config-3

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the GaetanMichelet/chat-60_ft_task-3 and the GaetanMichelet/chat-120_ft_task-3 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.4408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 150

Training results

Training Loss Epoch Step Validation Loss
2.3729 1.0 11 2.4972
2.6938 2.0 22 2.4571
2.6474 3.0 33 2.3881
2.2763 4.0 44 2.2642
2.0268 5.0 55 2.0694
1.7309 6.0 66 1.7871
1.4481 7.0 77 1.4330
1.0554 8.0 88 1.0675
0.8392 9.0 99 0.7563
0.4685 10.0 110 0.6437
0.3588 11.0 121 0.5851
0.6319 12.0 132 0.5407
0.4211 13.0 143 0.5248
0.495 14.0 154 0.5127
0.4232 15.0 165 0.5019
0.496 16.0 176 0.5103
0.3903 17.0 187 0.4814
0.331 18.0 198 0.4913
0.2403 19.0 209 0.4869
0.3563 20.0 220 0.4718
0.4107 21.0 231 0.4596
0.2631 22.0 242 0.4478
0.4212 23.0 253 0.4496
0.3304 24.0 264 0.4408
0.3296 25.0 275 0.4437
0.3266 26.0 286 0.4441
0.1403 27.0 297 0.4496
0.1732 28.0 308 0.4574
0.1797 29.0 319 0.4809
0.1355 30.0 330 0.4990
0.1346 31.0 341 0.5313

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for GaetanMichelet/Llama-31-8B_task-3_120-samples_config-3

Adapter
(462)
this model

Collection including GaetanMichelet/Llama-31-8B_task-3_120-samples_config-3