|
--- |
|
library_name: transformers |
|
--- |
|
|
|
Microsoft Table Transformer Table Structure Recognition trained on Pubtables and Fintabnet |
|
|
|
If you do not have the deepdoctection Profile of the model, please add: |
|
|
|
|
|
```python |
|
import deepdoctection as dd |
|
|
|
dd.ModelCatalog.register("deepdoctection/tatr_tab_struct_v2/pytorch_model.bin", dd.ModelProfile( |
|
name="deepdoctection/tatr_tab_struct_v2/pytorch_model.bin", |
|
description="Table Transformer (DETR) model trained on PubTables1M. It was introduced in the paper " |
|
"Aligning benchmark datasets for table structure recognition by Smock et " |
|
"al. This model is devoted to table structure recognition and assumes to receive a slightly cropped" |
|
"table as input. It will predict rows, column and spanning cells. Use a padding of around 5 pixels", |
|
size=[115511753], |
|
tp_model=False, |
|
config="deepdoctection/tatr_tab_struct_v2/config.json", |
|
preprocessor_config="deepdoctection/tatr_tab_struct_v2/preprocessor_config.json", |
|
hf_repo_id="deepdoctection/tatr_tab_struct_v2", |
|
hf_model_name="pytorch_model.bin", |
|
hf_config_file=["config.json", "preprocessor_config.json"], |
|
categories={ |
|
"1": dd.LayoutType.table, |
|
"2": dd.LayoutType.column, |
|
"3": dd.LayoutType.row, |
|
"4": dd.CellType.column_header, |
|
"5": dd.CellType.projected_row_header, |
|
"6": dd.CellType.spanning, |
|
}, |
|
dl_library="PT", |
|
model_wrapper="HFDetrDerivedDetector", |
|
)) |
|
``` |
|
|
|
When running the model within the deepdoctection analyzer, adjust the segmentation parameters in order to get better predictions. |
|
|
|
```python |
|
import deepdoctection as dd |
|
|
|
analyzer = dd.get_dd_analyzer(reset_config_file=True, config_overwrite=["PT.ITEM.WEIGHTS=deepdoctection/tatr_tab_struct_v2/pytorch_model.bin", |
|
"PT.ITEM.FILTER=['table']", |
|
"PT.ITEM.PAD.TOP=5", |
|
"PT.ITEM.PAD.RIGHT=5", |
|
"PT.ITEM.PAD.BOTTOM=5", |
|
"PT.ITEM.PAD.LEFT=5", |
|
"SEGMENTATION.THRESHOLD_ROWS=0.9", |
|
"SEGMENTATION.THRESHOLD_COLS=0.9", |
|
"SEGMENTATION.REMOVE_IOU_THRESHOLD_ROWS=0.3", |
|
"SEGMENTATION.REMOVE_IOU_THRESHOLD_COLS=0.3", |
|
"WORD_MATCHING.MAX_PARENT_ONLY=True"]) |
|
``` |
|
|
|
|