Mistral-7B-Instruct-v0.2-mirage-all-teacher-instruct-mistral-sft

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.9628

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
1.3478	0.0412	200	1.2310
1.3495	0.0824	400	1.1826
1.3753	0.1237	600	1.1557
1.3454	0.1649	800	1.1297
1.2731	0.2061	1000	1.1071
1.3863	0.2473	1200	1.0878
1.2567	0.2885	1400	1.0777
1.257	0.3298	1600	1.0630
1.2129	0.3710	1800	1.0518
1.1939	0.4122	2000	1.0405
1.2658	0.4534	2200	1.0313
1.1718	0.4946	2400	1.0186
1.1795	0.5359	2600	1.0102
1.1984	0.5771	2800	1.0008
1.157	0.6183	3000	0.9930
1.1542	0.6595	3200	0.9862
1.1648	0.7007	3400	0.9802
1.1403	0.7420	3600	0.9750
1.1268	0.7832	3800	0.9705
1.2122	0.8244	4000	0.9672
1.0571	0.8656	4200	0.9649
1.0903	0.9068	4400	0.9635
1.178	0.9481	4600	0.9629
1.1661	0.9893	4800	0.9628

Framework versions

PEFT 0.7.1
Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

nthakur
/

Mistral-7B-Instruct-v0.2-mirage-all-teacher-instruct-mistral-sft

Mistral-7B-Instruct-v0.2-mirage-all-teacher-instruct-mistral-sft

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for nthakur/Mistral-7B-Instruct-v0.2-mirage-all-teacher-instruct-mistral-sft

Evaluation results