Update README.md
Browse files
README.md
CHANGED
@@ -3,39 +3,47 @@ library_name: transformers
|
|
3 |
tags: []
|
4 |
---
|
5 |
|
6 |
-
#
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
-
|
33 |
-
|
34 |
-
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
### Direct Use
|
41 |
|
|
|
3 |
tags: []
|
4 |
---
|
5 |
|
6 |
+
# DIT-base-layout-detection
|
7 |
+
|
8 |
+
We present the model cmarkea/dit-base-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document.
|
9 |
+
This is a fine-tuning of the model [dit-base](https://huggingface.co/microsoft/dit-base) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet)
|
10 |
+
dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an
|
11 |
+
ODQA system.
|
12 |
+
|
13 |
+
This model allows extracting 11 entities, which are: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title.
|
14 |
+
|
15 |
+
## Performance
|
16 |
+
|
17 |
+
In this section, we will assess the model's performance by separately considering semantic segmentation and object detection. In both cases, no post-processing was
|
18 |
+
applied after estimation.
|
19 |
+
|
20 |
+
For semantic segmentation, we will use the F1-score to evaluate the classification of each pixel. For object detection, we will assess performance based on the
|
21 |
+
Generalized Intersection over Union (GIoU) and the accuracy of the predicted bounding box class. The evaluation is conducted on 500 pages from the PDF evaluation
|
22 |
+
dataset of DocLayNet.
|
23 |
+
|
24 |
+
| Class | f1-score (x100) | GIoU (x100) | accuracy (x100) |
|
25 |
+
|:--------------:|:---------------:|:-----------:|:---------------:|
|
26 |
+
| Background | 94.98 | NA | NA |
|
27 |
+
| Caption | 75.54 | 55.61 | 72.62 |
|
28 |
+
| Footnote | 72.29 | 50.08 | 70.97 |
|
29 |
+
| Formula | 82.29 | 49.91 | 94.48 |
|
30 |
+
| List-item | 67.56 | 35.19 | 69 |
|
31 |
+
| Page-footer | 83.93 | 57.99 | 94.06 |
|
32 |
+
| Page-header | 62.33 | 65.25 | 79.39 |
|
33 |
+
| Picture | 78.32 | 58.22 | 92.71 |
|
34 |
+
| Section-header | 69.55 | 56.64 | 78.29 |
|
35 |
+
| Table | 83.69 | 63.03 | 90.13 |
|
36 |
+
| Text | 90.94 | 51.89 | 88.09 |
|
37 |
+
| Title | 61.19 | 52.64 | 70 |
|
38 |
+
|
39 |
+
## Benchmark
|
40 |
+
|
41 |
+
Now, let's compare the performance of this model with other models.
|
42 |
+
|
43 |
+
| Class | f1-score (x100) | GIoU (x100) | accuracy (x100) |
|
44 |
+
|:---------------------------------------------------------------------------------------------:|:---------------:|:-----------:|:---------------:|
|
45 |
+
| cmarkea/dit-base-layout-detection | 90.77 | 56.29 | 85.26 |
|
46 |
+
| [cmarkea/detr-layout-detection](https://huggingface.co/cmarkea/detr-layout-detection) | 84.23 | 43.84 | 71.98 |
|
47 |
|
48 |
### Direct Use
|
49 |
|