zeeshan73's picture
Model save
837dbc8 verified
metadata
base_model: mistralai/Mistral-7B-Instruct-v0.3
datasets:
  - generator
library_name: peft
license: apache-2.0
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: mistral_7b_cosine_lr
    results: []

mistral_7b_cosine_lr

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the generator dataset. It achieves the following results on the evaluation set:

  • Loss: 5.3993

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.003
  • train_batch_size: 3
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss
11.1885 0.0549 10 61.4970
37.6512 0.1098 20 12.9405
14.576 0.1647 30 27.9852
9.5892 0.2196 40 6.4722
7.7639 0.2745 50 6.8158
6.3878 0.3294 60 6.3811
6.6118 0.3844 70 5.9281
6.006 0.4393 80 5.6753
6.1011 0.4942 90 5.8083
5.7396 0.5491 100 5.6193
5.5128 0.6040 110 5.4848
5.4599 0.6589 120 5.4267
5.5193 0.7138 130 5.4757
5.4488 0.7687 140 5.4422
5.4257 0.8236 150 5.3845
5.3938 0.8785 160 5.3727
5.3937 0.9334 170 5.3646
5.3916 0.9883 180 5.4825
5.4217 1.0432 190 5.3534
5.3915 1.0981 200 5.3497
5.3656 1.1531 210 5.3416
5.3718 1.2080 220 5.3691
5.3763 1.2629 230 5.4102
5.4039 1.3178 240 5.3993

Framework versions

  • PEFT 0.13.2
  • Transformers 4.45.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.0