Cyrile commited on
Commit
74f402a
·
verified ·
1 Parent(s): 5d6e31d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -33
README.md CHANGED
@@ -3,39 +3,47 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
 
 
 
 
 
39
 
40
  ### Direct Use
41
 
 
3
  tags: []
4
  ---
5
 
6
+ # DIT-base-layout-detection
7
+
8
+ We present the model cmarkea/dit-base-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document.
9
+ This is a fine-tuning of the model [dit-base](https://huggingface.co/microsoft/dit-base) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet)
10
+ dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an
11
+ ODQA system.
12
+
13
+ This model allows extracting 11 entities, which are: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title.
14
+
15
+ ## Performance
16
+
17
+ In this section, we will assess the model's performance by separately considering semantic segmentation and object detection. In both cases, no post-processing was
18
+ applied after estimation.
19
+
20
+ For semantic segmentation, we will use the F1-score to evaluate the classification of each pixel. For object detection, we will assess performance based on the
21
+ Generalized Intersection over Union (GIoU) and the accuracy of the predicted bounding box class. The evaluation is conducted on 500 pages from the PDF evaluation
22
+ dataset of DocLayNet.
23
+
24
+ | Class | f1-score (x100) | GIoU (x100) | accuracy (x100) |
25
+ |:--------------:|:---------------:|:-----------:|:---------------:|
26
+ | Background | 94.98 | NA | NA |
27
+ | Caption | 75.54 | 55.61 | 72.62 |
28
+ | Footnote | 72.29 | 50.08 | 70.97 |
29
+ | Formula | 82.29 | 49.91 | 94.48 |
30
+ | List-item | 67.56 | 35.19 | 69 |
31
+ | Page-footer | 83.93 | 57.99 | 94.06 |
32
+ | Page-header | 62.33 | 65.25 | 79.39 |
33
+ | Picture | 78.32 | 58.22 | 92.71 |
34
+ | Section-header | 69.55 | 56.64 | 78.29 |
35
+ | Table | 83.69 | 63.03 | 90.13 |
36
+ | Text | 90.94 | 51.89 | 88.09 |
37
+ | Title | 61.19 | 52.64 | 70 |
38
+
39
+ ## Benchmark
40
+
41
+ Now, let's compare the performance of this model with other models.
42
+
43
+ | Class | f1-score (x100) | GIoU (x100) | accuracy (x100) |
44
+ |:---------------------------------------------------------------------------------------------:|:---------------:|:-----------:|:---------------:|
45
+ | cmarkea/dit-base-layout-detection | 90.77 | 56.29 | 85.26 |
46
+ | [cmarkea/detr-layout-detection](https://huggingface.co/cmarkea/detr-layout-detection) | 84.23 | 43.84 | 71.98 |
47
 
48
  ### Direct Use
49