qwenvl-2B-cadica-direction-then-detect-and-classify-scale6

This model is a fine-tuned version of ben81828/CADICA_qwenvl_direction on the CADICA狹窄分析選擇題scale6(TRAIN) and the CADICA狹窄分析千問定位但不分類題scale6(TRAIN) datasets. It achieves the following results on the evaluation set:

Loss: 0.1728
Num Input Tokens Seen: 35316128

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 6
total_train_batch_size: 24
total_eval_batch_size: 4
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
training_steps: 3400

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.3918	0.0148	50	1.0422	516240
0.8208	0.0295	100	0.8917	1030696
0.8125	0.0443	150	0.9009	1550792
0.7675	0.0591	200	0.9007	2071176
0.7558	0.0739	250	0.8108	2587272
0.78	0.0886	300	0.8194	3107200
0.6602	0.1034	350	0.7663	3625752
0.6739	0.1182	400	0.7039	4142592
0.5661	0.1329	450	0.7133	4663320
0.6283	0.1477	500	0.6505	5183664
0.5957	0.1625	550	0.6883	5703016
0.6331	0.1773	600	0.5883	6222736
0.5483	0.1920	650	0.6101	6743120
0.477	0.2068	700	0.5884	7262832
0.514	0.2216	750	0.4666	7779872
0.4239	0.2363	800	0.4822	8301976
0.4949	0.2511	850	0.6122	8822832
0.4852	0.2659	900	0.5606	9345160
0.4737	0.2806	950	0.4791	9863168
0.4005	0.2954	1000	0.5501	10379136
0.3991	0.3102	1050	0.4378	10897528
0.4624	0.3250	1100	0.5301	11413120
0.4432	0.3397	1150	0.4249	11933632
0.3296	0.3545	1200	0.2966	12456040
0.335	0.3693	1250	0.3185	12972696
0.3594	0.3840	1300	0.4716	13493264
0.3731	0.3988	1350	0.5566	14014736
0.388	0.4136	1400	0.3866	14532288
0.3131	0.4284	1450	0.4740	15050992
0.2928	0.4431	1500	0.4049	15572048
0.3588	0.4579	1550	0.2871	16091960
0.3879	0.4727	1600	0.3136	16609960
0.2698	0.4874	1650	0.4020	17130896
0.3904	0.5022	1700	0.3297	17650984
0.3173	0.5170	1750	0.4491	18169344
0.3127	0.5318	1800	0.3499	18691928
0.2828	0.5465	1850	0.3781	19212992
0.306	0.5613	1900	0.3766	19735976
0.2992	0.5761	1950	0.3468	20253288
0.2341	0.5908	2000	0.3366	20770728
0.2931	0.6056	2050	0.3386	21291664
0.1826	0.6204	2100	0.5386	21813984
0.2387	0.6352	2150	0.2581	22332144
0.2662	0.6499	2200	0.4840	22849552
0.2332	0.6647	2250	0.4966	23366784
0.2481	0.6795	2300	0.2418	23883032
0.2313	0.6942	2350	0.1870	24401256
0.262	0.7090	2400	0.3471	24921872
0.2412	0.7238	2450	0.3456	25439896
0.2382	0.7386	2500	0.2543	25961056
0.2364	0.7533	2550	0.3871	26477208
0.2082	0.7681	2600	0.3406	26997904
0.1736	0.7829	2650	0.2697	27521088
0.2225	0.7976	2700	0.4155	28042992
0.2501	0.8124	2750	0.4115	28561248
0.2507	0.8272	2800	0.3223	29079576
0.1928	0.8419	2850	0.2828	29600536
0.2029	0.8567	2900	0.3943	30118072
0.1692	0.8715	2950	0.2034	30637448
0.234	0.8863	3000	0.2556	31159736
0.2303	0.9010	3050	0.2253	31679080
0.1999	0.9158	3100	0.2710	32196176
0.2069	0.9306	3150	0.2029	32713824
0.2135	0.9453	3200	0.3564	33235872
0.1964	0.9601	3250	0.3081	33752488
0.2131	0.9749	3300	0.3541	34269496
0.1779	0.9897	3350	0.2255	34784784
0.2173	1.0044	3400	0.4078	35305984

Framework versions

PEFT 0.12.0
Transformers 4.47.0.dev0
Pytorch 2.5.1+cu121
Datasets 3.1.0
Tokenizers 0.20.3

ben81828
/

qwenvl-2B-cadica-direction-then-detect-and-classify-scale6

qwenvl-2B-cadica-direction-then-detect-and-classify-scale6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ben81828/qwenvl-2B-cadica-direction-then-detect-and-classify-scale6

Evaluation results