JaMe76 commited on
Commit
acd963e
·
1 Parent(s): 4aaac75

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ ---
4
+
5
+ Microsoft Table Transformer Table Structure Recognition trained on Pubtables and Fintabnet
6
+
7
+ If you do not have the deepdoctection Profile of the model, please add:
8
+
9
+
10
+ ```python
11
+ import deepdoctection as dd
12
+
13
+ dd.ModelCatalog.register("deepdoctection/tatr_tab_struct_v2/pytorch_model.bin", dd.ModelProfile(
14
+ name="deepdoctection/tatr_tab_struct_v2/pytorch_model.bin",
15
+ description="Table Transformer (DETR) model trained on PubTables1M. It was introduced in the paper "
16
+ "Aligning benchmark datasets for table structure recognition by Smock et "
17
+ "al. This model is devoted to table structure recognition and assumes to receive a slightly cropped"
18
+ "table as input. It will predict rows, column and spanning cells. Use a padding of around 5 pixels",
19
+ size=[115511753],
20
+ tp_model=False,
21
+ config="deepdoctection/tatr_tab_struct_v2/config.json",
22
+ preprocessor_config="deepdoctection/tatr_tab_struct_v2/preprocessor_config.json",
23
+ hf_repo_id="deepdoctection/tatr_tab_struct_v2",
24
+ hf_model_name="pytorch_model.bin",
25
+ hf_config_file=["config.json", "preprocessor_config.json"],
26
+ categories={
27
+ "1": dd.LayoutType.table,
28
+ "2": dd.LayoutType.column,
29
+ "3": dd.LayoutType.row,
30
+ "4": dd.CellType.column_header,
31
+ "5": dd.CellType.projected_row_header,
32
+ "6": dd.CellType.spanning,
33
+ },
34
+ dl_library="PT",
35
+ model_wrapper="HFDetrDerivedDetector",
36
+ ))
37
+ ```
38
+
39
+ When running the model within the deepdoctection analyzer, adjust the segmentation parameters in order to get better predictions.
40
+
41
+ ```python
42
+ import deepdoctection as dd
43
+
44
+ analyzer = dd.get_dd_analyzer(reset_config_file=True, config_overwrite=["PT.ITEM.WEIGHTS=microsoft/tatr_v1/pytorch_model.bin",
45
+ "PT.ITEM.FILTER=['table']",
46
+ "PT.ITEM.PAD.TOP=5",
47
+ "PT.ITEM.PAD.RIGHT=5",
48
+ "PT.ITEM.PAD.BOTTOM=5",
49
+ "PT.ITEM.PAD.LEFT=5",
50
+ "SEGMENTATION.THRESHOLD_ROWS=0.9",
51
+ "SEGMENTATION.THRESHOLD_COLS=0.9",
52
+ "SEGMENTATION.REMOVE_IOU_THRESHOLD_ROWS=0.3",
53
+ "SEGMENTATION.REMOVE_IOU_THRESHOLD_COLS=0.3",
54
+ "WORD_MATCHING.MAX_PARENT_ONLY=True"])
55
+ ```
56
+