cmarkea
/

dit-base-layout-detection

Image Segmentation

Transformers

Safetensors

beit

Inference Endpoints

Model card Files Files and versions Community

Cyrile commited on Aug 14, 2024

Commit

74f402a

verified ·

1 Parent(s): 5d6e31d

Update README.md

Browse files

Files changed (1) hide show

README.md +41 -33

README.md CHANGED Viewed

@@ -3,39 +3,47 @@ library_name: transformers
 tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use

 tags: []
 ---
+# DIT-base-layout-detection
+We present the model cmarkea/dit-base-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document.
+This is a fine-tuning of the model [dit-base](https://huggingface.co/microsoft/dit-base) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet)
+dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an
+ODQA system.
+This model allows extracting 11 entities, which are: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title.
+## Performance
+In this section, we will assess the model's performance by separately considering semantic segmentation and object detection. In both cases, no post-processing was
+applied after estimation.
+For semantic segmentation, we will use the F1-score to evaluate the classification of each pixel. For object detection, we will assess performance based on the
+Generalized Intersection over Union (GIoU) and the accuracy of the predicted bounding box class. The evaluation is conducted on 500 pages from the PDF evaluation
+dataset of DocLayNet.
+|      Class     | f1-score (x100) | GIoU (x100) | accuracy (x100) |
+|:--------------:|:---------------:|:-----------:|:---------------:|
+|   Background   |      94.98      |      NA     |        NA       |
+|     Caption    |      75.54      |    55.61    |      72.62      |
+|    Footnote    |      72.29      |    50.08    |      70.97      |
+|     Formula    |      82.29      |    49.91    |      94.48      |
+|    List-item   |      67.56      |    35.19    |      69         |
+|   Page-footer  |      83.93      |    57.99    |      94.06      |
+|   Page-header  |      62.33      |    65.25    |      79.39      |
+|     Picture    |      78.32      |    58.22    |      92.71      |
+| Section-header |      69.55      |    56.64    |      78.29      |
+|      Table     |      83.69      |    63.03    |      90.13      |
+|      Text      |      90.94      |    51.89    |      88.09      |
+|      Title     |      61.19      |    52.64    |      70         |
+## Benchmark
+Now, let's compare the performance of this model with other models.
+|      Class                                                                                    | f1-score (x100) | GIoU (x100) | accuracy (x100) |
+|:---------------------------------------------------------------------------------------------:|:---------------:|:-----------:|:---------------:|
+| cmarkea/dit-base-layout-detection                                                             |      90.77      |    56.29    |      85.26      |
+| [cmarkea/detr-layout-detection](https://huggingface.co/cmarkea/detr-layout-detection)         |      84.23      |    43.84    |      71.98      |
 ### Direct Use