metadata
license: mit
base_model: microsoft/deberta-v3-large
tags:
- generated_from_trainer
datasets:
- squad_v2
model-index:
- name: deberta-v3-large-finetuned-squadv2
results: []
deberta-v3-large-finetuned-squadv2
This model is a fine-tuned version of microsoft/deberta-v3-large on the squad_v2 dataset. It achieves the following results on the evaluation set:
- Loss: 0.5579
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- training_steps: 5200
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.5293 | 1.57 | 3200 | 0.5739 |
0.5106 | 1.58 | 3220 | 0.5783 |
0.5338 | 1.59 | 3240 | 0.5718 |
0.5128 | 1.6 | 3260 | 0.5827 |
0.5205 | 1.61 | 3280 | 0.6045 |
0.5114 | 1.62 | 3300 | 0.5880 |
0.5072 | 1.63 | 3320 | 0.5788 |
0.5512 | 1.64 | 3340 | 0.5863 |
0.4723 | 1.65 | 3360 | 0.5898 |
0.5011 | 1.66 | 3380 | 0.5917 |
0.5419 | 1.67 | 3400 | 0.6027 |
0.5425 | 1.68 | 3420 | 0.5699 |
0.5703 | 1.69 | 3440 | 0.5897 |
0.4646 | 1.7 | 3460 | 0.5917 |
0.4652 | 1.71 | 3480 | 0.5745 |
0.5323 | 1.72 | 3500 | 0.5860 |
0.5129 | 1.73 | 3520 | 0.5656 |
0.5441 | 1.74 | 3540 | 0.5642 |
0.5624 | 1.75 | 3560 | 0.5873 |
0.4645 | 1.76 | 3580 | 0.5891 |
0.5577 | 1.77 | 3600 | 0.5816 |
0.5199 | 1.78 | 3620 | 0.5579 |
0.5061 | 1.79 | 3640 | 0.5837 |
0.484 | 1.79 | 3660 | 0.5721 |
0.5095 | 1.8 | 3680 | 0.5821 |
0.5342 | 1.81 | 3700 | 0.5602 |
0.5435 | 1.82 | 3720 | 0.5911 |
0.5288 | 1.83 | 3740 | 0.5647 |
0.5476 | 1.84 | 3760 | 0.5733 |
0.5199 | 1.85 | 3780 | 0.5675 |
0.5067 | 1.86 | 3800 | 0.5839 |
0.5418 | 1.87 | 3820 | 0.5757 |
0.4965 | 1.88 | 3840 | 0.5764 |
0.5273 | 1.89 | 3860 | 0.5906 |
0.5808 | 1.9 | 3880 | 0.5762 |
0.5161 | 1.91 | 3900 | 0.5612 |
0.4863 | 1.92 | 3920 | 0.5804 |
0.4827 | 1.93 | 3940 | 0.5841 |
0.4643 | 1.94 | 3960 | 0.5822 |
0.5029 | 1.95 | 3980 | 0.6052 |
0.509 | 1.96 | 4000 | 0.5800 |
0.5382 | 1.97 | 4020 | 0.5645 |
0.469 | 1.98 | 4040 | 0.5685 |
0.5032 | 1.99 | 4060 | 0.5779 |
0.5171 | 2.0 | 4080 | 0.5686 |
0.3938 | 2.01 | 4100 | 0.5889 |
0.4321 | 2.02 | 4120 | 0.6039 |
0.4185 | 2.03 | 4140 | 0.5996 |
0.4782 | 2.04 | 4160 | 0.5800 |
0.424 | 2.05 | 4180 | 0.6374 |
0.3766 | 2.06 | 4200 | 0.6096 |
0.415 | 2.07 | 4220 | 0.6221 |
0.4352 | 2.08 | 4240 | 0.6150 |
0.4336 | 2.09 | 4260 | 0.6055 |
0.4289 | 2.1 | 4280 | 0.6138 |
0.4433 | 2.11 | 4300 | 0.5946 |
0.4478 | 2.12 | 4320 | 0.6118 |
0.4787 | 2.13 | 4340 | 0.5969 |
0.4432 | 2.14 | 4360 | 0.6048 |
0.4319 | 2.15 | 4380 | 0.5948 |
0.3939 | 2.16 | 4400 | 0.6116 |
0.3921 | 2.17 | 4420 | 0.6082 |
0.4381 | 2.18 | 4440 | 0.6282 |
0.4461 | 2.19 | 4460 | 0.6084 |
0.4012 | 2.2 | 4480 | 0.6092 |
0.3849 | 2.21 | 4500 | 0.6152 |
0.4178 | 2.22 | 4520 | 0.6004 |
0.4163 | 2.23 | 4540 | 0.6059 |
0.4006 | 2.24 | 4560 | 0.6115 |
0.4225 | 2.25 | 4580 | 0.6130 |
0.4008 | 2.26 | 4600 | 0.6095 |
0.4706 | 2.27 | 4620 | 0.6136 |
0.3902 | 2.28 | 4640 | 0.6103 |
0.4048 | 2.29 | 4660 | 0.6085 |
0.4411 | 2.3 | 4680 | 0.6139 |
0.403 | 2.31 | 4700 | 0.6047 |
0.4799 | 2.31 | 4720 | 0.6043 |
0.4316 | 2.32 | 4740 | 0.5960 |
0.4198 | 2.33 | 4760 | 0.6031 |
0.4254 | 2.34 | 4780 | 0.6033 |
0.387 | 2.35 | 4800 | 0.6120 |
0.3882 | 2.36 | 4820 | 0.6128 |
0.4307 | 2.37 | 4840 | 0.6150 |
0.434 | 2.38 | 4860 | 0.6077 |
0.4225 | 2.39 | 4880 | 0.6071 |
0.4134 | 2.4 | 4900 | 0.6036 |
0.3846 | 2.41 | 4920 | 0.6124 |
0.3943 | 2.42 | 4940 | 0.6291 |
0.4455 | 2.43 | 4960 | 0.6185 |
0.4104 | 2.44 | 4980 | 0.6064 |
0.4158 | 2.45 | 5000 | 0.6095 |
0.4135 | 2.46 | 5020 | 0.6155 |
0.3789 | 2.47 | 5040 | 0.6209 |
0.418 | 2.48 | 5060 | 0.6106 |
0.3931 | 2.49 | 5080 | 0.6047 |
0.4289 | 2.5 | 5100 | 0.6055 |
0.4051 | 2.51 | 5120 | 0.6084 |
0.4217 | 2.52 | 5140 | 0.6118 |
0.3843 | 2.53 | 5160 | 0.6139 |
0.4435 | 2.54 | 5180 | 0.6126 |
0.4274 | 2.55 | 5200 | 0.6120 |
Framework versions
- Transformers 4.35.0.dev0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.14.0