Mixtral_Alpace_v2

This model is a fine-tuned version of mistralai/Mixtral-8x7B-v0.1 on the generator dataset. It achieves the following results on the evaluation set:

Loss: 0.5617

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2.5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 15
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss
1.5577	0.0813	10	1.5534
1.4512	0.1626	20	1.4827
1.4106	0.2439	30	1.4104
1.3419	0.3252	40	1.3460
1.2361	0.4065	50	1.2827
1.2298	0.4878	60	1.2097
1.1468	0.5691	70	1.1400
1.0874	0.6504	80	1.0724
1.0372	0.7317	90	1.0088
0.9185	0.8130	100	0.9566
0.8927	0.8943	110	0.9139
0.8264	0.9756	120	0.8724
0.8799	1.0569	130	0.8329
0.8233	1.1382	140	0.7947
0.7761	1.2195	150	0.7633
0.7568	1.3008	160	0.7407
0.6957	1.3821	170	0.7224
0.6712	1.4634	180	0.7048
0.6738	1.5447	190	0.6908
0.7165	1.6260	200	0.6781
0.5913	1.7073	210	0.6673
0.6992	1.7886	220	0.6584
0.6438	1.8699	230	0.6497
0.6649	1.9512	240	0.6425
0.5907	2.0325	250	0.6358
0.6014	2.1138	260	0.6302
0.5605	2.1951	270	0.6250
0.5893	2.2764	280	0.6209
0.5761	2.3577	290	0.6166
0.6083	2.4390	300	0.6132
0.6404	2.5203	310	0.6100
0.5949	2.6016	320	0.6076
0.6208	2.6829	330	0.6047
0.6083	2.7642	340	0.6025
0.5922	2.8455	350	0.5998
0.6377	2.9268	360	0.5980
0.6059	3.0081	370	0.5960
0.6697	3.0894	380	0.5940
0.5813	3.1707	390	0.5925
0.5442	3.2520	400	0.5911
0.506	3.3333	410	0.5889
0.5806	3.4146	420	0.5878
0.5504	3.4959	430	0.5868
0.6051	3.5772	440	0.5849
0.5952	3.6585	450	0.5838
0.5128	3.7398	460	0.5825
0.5779	3.8211	470	0.5813
0.5448	3.9024	480	0.5802
0.5559	3.9837	490	0.5796
0.6136	4.0650	500	0.5787
0.5329	4.1463	510	0.5776
0.5267	4.2276	520	0.5767
0.5492	4.3089	530	0.5763
0.5206	4.3902	540	0.5758
0.5088	4.4715	550	0.5747
0.5811	4.5528	560	0.5739
0.5865	4.6341	570	0.5728
0.5563	4.7154	580	0.5729
0.5692	4.7967	590	0.5719
0.5827	4.8780	600	0.5713
0.5551	4.9593	610	0.5715
0.5059	5.0407	620	0.5708
0.5132	5.1220	630	0.5700
0.5314	5.2033	640	0.5698
0.5614	5.2846	650	0.5696
0.5489	5.3659	660	0.5688
0.5404	5.4472	670	0.5680
0.5745	5.5285	680	0.5672
0.5083	5.6098	690	0.5673
0.5565	5.6911	700	0.5670
0.5515	5.7724	710	0.5664
0.5448	5.8537	720	0.5664
0.5276	5.9350	730	0.5657
0.5436	6.0163	740	0.5656
0.5988	6.0976	750	0.5650
0.4929	6.1789	760	0.5652
0.5957	6.2602	770	0.5645
0.4968	6.3415	780	0.5645
0.4822	6.4228	790	0.5645
0.5527	6.5041	800	0.5642
0.5663	6.5854	810	0.5640
0.493	6.6667	820	0.5634
0.4992	6.7480	830	0.5630
0.5618	6.8293	840	0.5630
0.568	6.9106	850	0.5626
0.4869	6.9919	860	0.5626
0.5418	7.0732	870	0.5625
0.5364	7.1545	880	0.5621
0.5675	7.2358	890	0.5621
0.491	7.3171	900	0.5620
0.5555	7.3984	910	0.5621
0.6093	7.4797	920	0.5621
0.5529	7.5610	930	0.5620
0.5252	7.6423	940	0.5620
0.5024	7.7236	950	0.5620
0.5639	7.8049	960	0.5616
0.4676	7.8862	970	0.5618
0.5236	7.9675	980	0.5617
0.4902	8.0488	990	0.5616
0.486	8.1301	1000	0.5617

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Cem13
/

mixtral_semptom_0

Mixtral_Alpace_v2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Cem13/mixtral_semptom_0

Evaluation results