doc2txt commited on
Commit
ee6fdb6
1 Parent(s): a77090b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -39,6 +39,30 @@ model-index:
39
 
40
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
41
  should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  # layoutlmv2-finetuned-cord
44
 
 
39
 
40
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
41
  should probably proofread and complete it, then remove this comment. -->
42
+ # overfitting issue
43
+ I use this colab:
44
+ https://colab.research.google.com/drive/1AXh3G3-VmbMWlwbSvesVIurzNlcezTce?usp=sharing
45
+
46
+ to Fine tuning LayoutLMv2ForTokenClassification on CORD dataset
47
+
48
+ here is the result:
49
+ https://huggingface.co/doc2txt/layoutlmv2-finetuned-cord
50
+
51
+ * F1: 0.9665
52
+
53
+ and indeed the result are pretty amazing when running on the test set,
54
+ however when running on any other receipt (printed or pdf) the result are completely off
55
+
56
+ So from some reason the model is overfitting to the cord dataset, even though I use similar images for testing.
57
+
58
+ I don't think that there is a **Data leakage** unless the cord DS is not clean (which I assume it is clean)
59
+
60
+ What could be the reason for this?
61
+ Is it some inherent property of LayoutLM?
62
+ The LayoutLM models are somewhat old, and it seems deserted...
63
+
64
+ I don't have much experience so I would appreciate any info
65
+ Thanks
66
 
67
  # layoutlmv2-finetuned-cord
68