qwenvl-2B-cadica-direction-then-detect-and-classify-scale6

This model is a fine-tuned version of ben81828/CADICA_qwenvl_direction on the CADICA狹窄分析選擇題scale6(TRAIN) and the CADICA狹窄分析千問定位但不分類題scale6(TRAIN) datasets. It achieves the following results on the evaluation set:

  • Loss: 0.1728
  • Num Input Tokens Seen: 35316128

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 24
  • total_eval_batch_size: 4
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 3400

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.3918 0.0148 50 1.0422 516240
0.8208 0.0295 100 0.8917 1030696
0.8125 0.0443 150 0.9009 1550792
0.7675 0.0591 200 0.9007 2071176
0.7558 0.0739 250 0.8108 2587272
0.78 0.0886 300 0.8194 3107200
0.6602 0.1034 350 0.7663 3625752
0.6739 0.1182 400 0.7039 4142592
0.5661 0.1329 450 0.7133 4663320
0.6283 0.1477 500 0.6505 5183664
0.5957 0.1625 550 0.6883 5703016
0.6331 0.1773 600 0.5883 6222736
0.5483 0.1920 650 0.6101 6743120
0.477 0.2068 700 0.5884 7262832
0.514 0.2216 750 0.4666 7779872
0.4239 0.2363 800 0.4822 8301976
0.4949 0.2511 850 0.6122 8822832
0.4852 0.2659 900 0.5606 9345160
0.4737 0.2806 950 0.4791 9863168
0.4005 0.2954 1000 0.5501 10379136
0.3991 0.3102 1050 0.4378 10897528
0.4624 0.3250 1100 0.5301 11413120
0.4432 0.3397 1150 0.4249 11933632
0.3296 0.3545 1200 0.2966 12456040
0.335 0.3693 1250 0.3185 12972696
0.3594 0.3840 1300 0.4716 13493264
0.3731 0.3988 1350 0.5566 14014736
0.388 0.4136 1400 0.3866 14532288
0.3131 0.4284 1450 0.4740 15050992
0.2928 0.4431 1500 0.4049 15572048
0.3588 0.4579 1550 0.2871 16091960
0.3879 0.4727 1600 0.3136 16609960
0.2698 0.4874 1650 0.4020 17130896
0.3904 0.5022 1700 0.3297 17650984
0.3173 0.5170 1750 0.4491 18169344
0.3127 0.5318 1800 0.3499 18691928
0.2828 0.5465 1850 0.3781 19212992
0.306 0.5613 1900 0.3766 19735976
0.2992 0.5761 1950 0.3468 20253288
0.2341 0.5908 2000 0.3366 20770728
0.2931 0.6056 2050 0.3386 21291664
0.1826 0.6204 2100 0.5386 21813984
0.2387 0.6352 2150 0.2581 22332144
0.2662 0.6499 2200 0.4840 22849552
0.2332 0.6647 2250 0.4966 23366784
0.2481 0.6795 2300 0.2418 23883032
0.2313 0.6942 2350 0.1870 24401256
0.262 0.7090 2400 0.3471 24921872
0.2412 0.7238 2450 0.3456 25439896
0.2382 0.7386 2500 0.2543 25961056
0.2364 0.7533 2550 0.3871 26477208
0.2082 0.7681 2600 0.3406 26997904
0.1736 0.7829 2650 0.2697 27521088
0.2225 0.7976 2700 0.4155 28042992
0.2501 0.8124 2750 0.4115 28561248
0.2507 0.8272 2800 0.3223 29079576
0.1928 0.8419 2850 0.2828 29600536
0.2029 0.8567 2900 0.3943 30118072
0.1692 0.8715 2950 0.2034 30637448
0.234 0.8863 3000 0.2556 31159736
0.2303 0.9010 3050 0.2253 31679080
0.1999 0.9158 3100 0.2710 32196176
0.2069 0.9306 3150 0.2029 32713824
0.2135 0.9453 3200 0.3564 33235872
0.1964 0.9601 3250 0.3081 33752488
0.2131 0.9749 3300 0.3541 34269496
0.1779 0.9897 3350 0.2255 34784784
0.2173 1.0044 3400 0.4078 35305984

Framework versions

  • PEFT 0.12.0
  • Transformers 4.47.0.dev0
  • Pytorch 2.5.1+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
36
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for ben81828/qwenvl-2B-cadica-direction-then-detect-and-classify-scale6