Edit model card

layoutlmv3-base-finetuned-rvlcdip

This model is a fine-tuned version of microsoft/layoutlmv3-base on the RVL-CDIP dataset processed using Amazon OCR. The following metrics were computed on the evaluation set after the final optimization step:

  • Evaluation Loss: 0.1856316477060318
  • Evaluation Accuracy: 0.9519237980949524
  • Evaluation Weighted F1: 0.9518911690649716
  • Evaluation Micro F1: 0.9519237980949524
  • Evaluation Macro F1: 0.9518042570370386
  • Evaluation Weighted Recall: 0.9519237980949524
  • Evaluation Micro Recall: 0.9519237980949524
  • Evaluation Macro Recall: 0.9518171728908463
  • Evaluation Weighted Precision: 0.9519094862975979
  • Evaluation Micro Precision: 0.9519237980949524
  • Evaluation Macro Precision: 0.9518423447239385
  • Evaluation Runtime (seconds): 514.7031
  • Evaluation Samples per Second: 77.713
  • Evaluation Steps per Second: 1.214

Training logs

See wandb report: https://api.wandb.ai/links/gordon-lim/lokqu7ok

Training arguments

The following arguments were provided to Trainer:

  • Output Directory: ./results
  • Maximum Steps: 20000
  • Per Device Train Batch Size: 32 (due to CUDA memory constraints; paper uses 64, trained using 2 GPUs so 32 * 2 effective batch size)
  • Per Device Evaluation Batch Size: 32 (due to CUDA memory constraints)
  • Warmup Steps: 0 (not specified in paper, but warmup ratio is used for DocVQA, hence assumed default)
  • Weight Decay: 0 (not specified in paper for RVL-CDIP, but 0.05 for PubLayNet, hence assumed default)
  • Evaluation Strategy: steps
  • Evaluation Steps: 1000
  • Evaluate on Start: True
  • Save Strategy: steps
  • Save Steps: 1000
  • Save Total Limit: 5
  • Learning Rate: 2e-5
  • Load Best Model at End: True
  • Metric for Best Model: accuracy
  • Greater is Better: True
  • Report to: wandb (log to Weights & Biases)
  • Logging Steps: 1000
  • Logging First Step: True
  • Learning Rate Scheduler Type: cosine (not mentioned in paper, but PubLayNet GitHub example uses 'cosine')
  • FP16: True (due to CUDA memory constraints)
  • Dataloader Number of Workers: 4 (number of subprocesses to use for data loading)
  • DDP Find Unused Parameters: True

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.2.0+cu121
  • Datasets 2.14.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
126M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results