OCR-LayoutLMv3 / README.md
jinhybr's picture
Update README.md
5fa47bf
---
license: cc-by-nc-sa-4.0
tags:
- generated_from_trainer
datasets:
- funsd-layoutlmv3
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: OCR-LayoutLMv3
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: funsd-layoutlmv3
type: funsd-layoutlmv3
config: funsd
split: train
args: funsd
metrics:
- name: Precision
type: precision
value: 0.8988653182042428
- name: Recall
type: recall
value: 0.905116741182315
- name: F1
type: f1
value: 0.9019801980198019
- name: Accuracy
type: accuracy
value: 0.8403661000832046
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# OCR-LayoutLMv3
This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) on the funsd-layoutlmv3 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9788
- Precision: 0.8989
- Recall: 0.9051
- F1: 0.9020
- Accuracy: 0.8404
## Model description
LayoutLMv3 is a pre-trained multimodal Transformer for Document AI with unified text and image masking. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model. For example, LayoutLMv3 can be fine-tuned for both text-centric tasks, including form understanding, receipt understanding, and document visual question answering, and image-centric tasks such as document image classification and document layout analysis.
[LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387)
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei, Preprint 2022.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- training_steps: 2000
### Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log | 1.33 | 100 | 0.6966 | 0.7418 | 0.8063 | 0.7727 | 0.7801 |
| No log | 2.67 | 200 | 0.5767 | 0.8104 | 0.8644 | 0.8365 | 0.8117 |
| No log | 4.0 | 300 | 0.5355 | 0.8246 | 0.8852 | 0.8539 | 0.8295 |
| No log | 5.33 | 400 | 0.5240 | 0.8706 | 0.8922 | 0.8813 | 0.8427 |
| 0.5326 | 6.67 | 500 | 0.6337 | 0.8528 | 0.8778 | 0.8651 | 0.8260 |
| 0.5326 | 8.0 | 600 | 0.6870 | 0.8698 | 0.8828 | 0.8762 | 0.8240 |
| 0.5326 | 9.33 | 700 | 0.6584 | 0.8723 | 0.9061 | 0.8889 | 0.8342 |
| 0.5326 | 10.67 | 800 | 0.7186 | 0.8868 | 0.9031 | 0.8949 | 0.8335 |
| 0.5326 | 12.0 | 900 | 0.6822 | 0.9040 | 0.9076 | 0.9058 | 0.8526 |
| 0.1248 | 13.33 | 1000 | 0.7042 | 0.8872 | 0.9021 | 0.8946 | 0.8511 |
| 0.1248 | 14.67 | 1100 | 0.7920 | 0.9027 | 0.9036 | 0.9032 | 0.8480 |
| 0.1248 | 16.0 | 1200 | 0.8052 | 0.8964 | 0.9151 | 0.9056 | 0.8389 |
| 0.1248 | 17.33 | 1300 | 0.8932 | 0.8995 | 0.9066 | 0.9030 | 0.8329 |
| 0.1248 | 18.67 | 1400 | 0.8728 | 0.8950 | 0.9061 | 0.9005 | 0.8398 |
| 0.0442 | 20.0 | 1500 | 0.9051 | 0.8960 | 0.9116 | 0.9037 | 0.8347 |
| 0.0442 | 21.33 | 1600 | 0.9587 | 0.8947 | 0.9031 | 0.8989 | 0.8401 |
| 0.0442 | 22.67 | 1700 | 0.9822 | 0.9042 | 0.9046 | 0.9044 | 0.8389 |
| 0.0442 | 24.0 | 1800 | 0.9734 | 0.9043 | 0.9061 | 0.9052 | 0.8391 |
| 0.0442 | 25.33 | 1900 | 0.9842 | 0.9042 | 0.9091 | 0.9066 | 0.8410 |
| 0.0225 | 26.67 | 2000 | 0.9788 | 0.8989 | 0.9051 | 0.9020 | 0.8404 |
### Framework versions
- Transformers 4.25.0.dev0
- Pytorch 1.12.1
- Datasets 2.6.1
- Tokenizers 0.13.1