metadata

license: mit
base_model: microsoft/git-base
tags:
  - generated_from_trainer
datasets:
  - imagefolder
model-index:
  - name: git-base-captioning
    results: []

git-base-captioning

This model is a fine-tuned version of microsoft/git-base on the imagefolder dataset. It achieves the following results on the evaluation set:

Loss: 0.3575
Wer Score: 0.8322

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer Score
7.8992	0.3540	20	7.5466	11.1579
5.9879	0.7080	40	5.6121	5.1946
4.1288	1.0619	60	3.7153	4.5433
2.3477	1.4159	80	1.9989	4.0242
1.0242	1.7699	100	0.8650	0.8657
0.4954	2.1239	120	0.4766	0.8676
0.3365	2.4779	140	0.3993	3.7516
0.4286	2.8319	160	0.3773	0.8336
0.2952	3.1858	180	0.3663	0.8329
0.3996	3.5398	200	0.3619	0.8224
0.2629	3.8938	220	0.3574	0.8204
0.254	4.2478	240	0.3555	0.8178
0.2642	4.6018	260	0.3557	0.8132
0.2725	4.9558	280	0.3533	0.8139
0.2746	5.3097	300	0.3554	0.8093
0.1765	5.6637	320	0.3561	0.8244
0.2981	6.0177	340	0.3542	0.8375
0.1489	6.3717	360	0.3567	0.8329
0.256	6.7257	380	0.3553	0.8362
0.1574	7.0796	400	0.3558	0.8342
0.1836	7.4336	420	0.3566	0.8336
0.1697	7.7876	440	0.3578	0.8362
0.1596	8.1416	460	0.3571	0.8414
0.1628	8.4956	480	0.3579	0.8388
0.1958	8.8496	500	0.3572	0.8362
0.1695	9.2035	520	0.3575	0.8303
0.1686	9.5575	540	0.3576	0.8336
0.2166	9.9115	560	0.3575	0.8322

Framework versions

Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1