final_model_5

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.8725

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: bfloat16

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
training_steps: 90

Training results

Training Loss	Epoch	Step	Validation Loss
0.0462	1.0	1	2.5821
0.0463	2.0	2	2.6255
0.0327	3.0	3	2.7177
0.0374	4.0	4	2.7702
0.0465	5.0	5	2.7528
0.029	6.0	6	2.7269
0.0239	7.0	7	2.6977
0.0284	8.0	8	2.6762
0.019	9.0	9	2.6788
0.0184	10.0	10	2.6653
0.0283	11.0	11	2.6582
0.0232	12.0	12	2.6511
0.0161	13.0	13	2.6508
0.0158	14.0	14	2.6450
0.0147	15.0	15	2.6431
0.0156	16.0	16	2.6449
0.014	17.0	17	2.6488
0.0139	18.0	18	2.6530
0.0137	19.0	19	2.6587
0.0136	20.0	20	2.6646
0.0135	21.0	21	2.6703
0.0134	22.0	22	2.6755
0.0133	23.0	23	2.6806
0.0131	24.0	24	2.6858
0.0131	25.0	25	2.6908
0.0129	26.0	26	2.6956
0.0128	27.0	27	2.7001
0.0127	28.0	28	2.7043
0.0125	29.0	29	2.7083
0.0123	30.0	30	2.7120
0.0121	31.0	31	2.7155
0.0121	32.0	32	2.7191
0.0117	33.0	33	2.7227
0.0115	34.0	34	2.7263
0.0113	35.0	35	2.7301
0.0111	36.0	36	2.7340
0.0108	37.0	37	2.7379
0.0106	38.0	38	2.7418
0.0104	39.0	39	2.7457
0.0104	40.0	40	2.7494
0.01	41.0	41	2.7532
0.0098	42.0	42	2.7569
0.0096	43.0	43	2.7606
0.0095	44.0	44	2.7643
0.0094	45.0	45	2.7681
0.0093	46.0	46	2.7720
0.0093	47.0	47	2.7760
0.0092	48.0	48	2.7802
0.0092	49.0	49	2.7846
0.0091	50.0	50	2.7892
0.0091	51.0	51	2.7940
0.0091	52.0	52	2.7989
0.0091	53.0	53	2.8039
0.009	54.0	54	2.8090
0.009	55.0	55	2.8141
0.0089	56.0	56	2.8191
0.0089	57.0	57	2.8239
0.0088	58.0	58	2.8284
0.0087	59.0	59	2.8331
0.0088	60.0	60	2.8372
0.0087	61.0	61	2.8405
0.0087	62.0	62	2.8433
0.0086	63.0	63	2.8457
0.0086	64.0	64	2.8476
0.0085	65.0	65	2.8499
0.0085	66.0	66	2.8514
0.0085	67.0	67	2.8530
0.0084	68.0	68	2.8545
0.0084	69.0	69	2.8560
0.0084	70.0	70	2.8575
0.0084	71.0	71	2.8590
0.0083	72.0	72	2.8605
0.0083	73.0	73	2.8620
0.0083	74.0	74	2.8633
0.0082	75.0	75	2.8646
0.0082	76.0	76	2.8657
0.0082	77.0	77	2.8668
0.0082	78.0	78	2.8679
0.0081	79.0	79	2.8689
0.0082	80.0	80	2.8697
0.0082	81.0	81	2.8705
0.0081	82.0	82	2.8711
0.0082	83.0	83	2.8716
0.0082	84.0	84	2.8719
0.0081	85.0	85	2.8721
0.0081	86.0	86	2.8723
0.0081	87.0	87	2.8724
0.0081	88.0	88	2.8724
0.0081	89.0	89	2.8725
0.0081	90.0	90	2.8725

Framework versions

PEFT 0.4.0
Transformers 4.37.2
Pytorch 2.2.1+cu121
Datasets 2.19.0
Tokenizers 0.15.2

hussamsal
/

final_model_5

final_model_5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hussamsal/final_model_5

Evaluation results