20231102-20_epochs_layoutlmv2-base-uncased_finetuned_docvqa

This model was trained from scratch on the 1.2 Example dataset released by DocVQA. It achieves the following results on the evaluation set:

Loss: 2.9087

Model description

This DocVQA model, built on the Layout LM v2 framework, represents an initial step in a series of experimental models aimed at document visual question answering. It's the "mini" version in a planned series, trained on a relatively small dataset of 1.2k samples (1,000 for training and 200 for testing) over 20 epochs. The training setup was modest, employing mixed precision (fp16), with manageable batch sizes and a focused approach to learning rate adjustment (warmup steps and weight decay). Notably, this model was trained without external reporting tools, emphasizing internal evaluation. As the first iteration in a progressive series that will later include medium (5k samples) and large (50k samples) models, this version serves as a foundational experiment, setting the stage for more extensive and complex models in the future.

Intended uses & limitations

Experimental Only

Training and evaluation data

Based on the sample 1.2 dataset released by DocVQA

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss
4.3689	3.51	100	3.7775
3.2761	7.02	200	3.3707
2.6415	10.53	300	3.0807
2.2233	14.04	400	3.0120
1.9586	17.54	500	2.9087

Framework versions

Transformers 4.34.1
Pytorch 2.0.1+cu118
Datasets 2.10.1
Tokenizers 0.14.1

zibajoon
/

20231108_layoutlm2_5k_3ep_Doc_NA_B