Transformers
Inference Endpoints
PeterStaar commited on
Commit
76cf972
1 Parent(s): 96e8ba4

updated the README

Browse files

Signed-off-by: Peter Staar <taa@zurich.ibm.com>

Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -1,3 +1,73 @@
1
  ---
2
  license: cdla-permissive-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cdla-permissive-2.0
3
  ---
4
+
5
+ # Docling Models
6
+
7
+ This page contains models that power the PDF document converion package [dockling](https://github.com/DS4SD/docling).
8
+
9
+ ## Layout Model
10
+
11
+ The layout model will take an image from a poge and apply RT-DETR model in order to find different layout components. It currently detects the labels: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title. As a reference (from the DocLayNet-paper), this is the performance of standard object detection methods on the DocLayNet dataset compared to human evaluation,
12
+
13
+ | | human | MRCNN | MRCNN | FRCNN | YOLO |
14
+ |----------------|---------|---------|---------|---------|--------|
15
+ | | human | R50 | R101 | R101 | v5x6 |
16
+ | Caption | 84-89 | 68.4 | 71.5 | 70.1 | 77.7 |
17
+ | Footnote | 83-91 | 70.9 | 71.8 | 73.7 | 77.2 |
18
+ | Formula | 83-85 | 60.1 | 63.4 | 63.5 | 66.2 |
19
+ | List-item | 87-88 | 81.2 | 80.8 | 81.0 | 86.2 |
20
+ | Page-footer | 93-94 | 61.6 | 59.3 | 58.9 | 61.1 |
21
+ | Page-header | 85-89 | 71.9 | 70.0 | 72.0 | 67.9 |
22
+ | Picture | 69-71 | 71.7 | 72.7 | 72.0 | 77.1 |
23
+ | Section-header | 83-84 | 67.6 | 69.3 | 68.4 | 74.6 |
24
+ | Table | 77-81 | 82.2 | 82.9 | 82.2 | 86.3 |
25
+ | Text | 84-86 | 84.6 | 85.8 | 85.4 | 88.1 |
26
+ | Title | 60-72 | 76.7 | 80.4 | 79.9 | 82.7 |
27
+ | All | 82-83 | 72.4 | 73.5 | 73.4 | 76.8 |
28
+
29
+ ## TableFormer
30
+
31
+ The tableformer model will identify the structure of the table, starting from an image of a table. It uses the predicted table regions of the layout model to identify the tables. Tableformer has SOTA table structure identification,
32
+
33
+ | Model (TEDS) | Simple table | Complex table | All tables |
34
+ | ------------ | ------------ | ------------- | ---------- |
35
+ | Tabula | 78.0 | 57.8 | 67.9 |
36
+ | Traprange | 60.8 | 49.9 | 55.4 |
37
+ | Camelot | 80.0 | 66.0 | 73.0 |
38
+ | Acrobat Pro | 68.9 | 61.8 | 65.3 |
39
+ | EDD | 91.2 | 85.4 | 88.3 |
40
+ | TableFormer | 95.4 | 90.1 | 93.6 |
41
+
42
+ ## References
43
+
44
+ ```
45
+ @techreport{Docling,
46
+ author = {Deep Search Team},
47
+ month = {8},
48
+ title = {{Docling Technical Report}},
49
+ url={https://arxiv.org/abs/2408.09869},
50
+ eprint={2408.09869},
51
+ doi = "10.48550/arXiv.2408.09869",
52
+ version = {1.0.0},
53
+ year = {2024}
54
+ }
55
+
56
+ @article{doclaynet2022,
57
+ title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis},
58
+ doi = {10.1145/3534678.353904},
59
+ url = {https://arxiv.org/abs/2206.01062},
60
+ author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
61
+ year = {2022}
62
+ }
63
+
64
+ @InProceedings{TableFormer2022,
65
+ author = {Nassar, Ahmed and Livathinos, Nikolaos and Lysak, Maksym and Staar, Peter},
66
+ title = {TableFormer: Table Structure Understanding With Transformers},
67
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
68
+ month = {June},
69
+ year = {2022},
70
+ pages = {4614-4623},
71
+ doi = {https://doi.org/10.1109/CVPR52688.2022.00457}
72
+ }
73
+ ```