VictorChew commited on
Commit
3ea09f6
β€’
1 Parent(s): e7dbc7c

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +17 -12
README.md CHANGED
@@ -13,7 +13,7 @@ license: apache-2.0
13
 
14
  [[ Github Repo ]](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) [[ Related Paper ]](https://arxiv.org/abs/2406.11633) [[ Website ]](https://unimodal4reasoning.github.io/DocGenome_page/)
15
 
16
- [[ Dataset (Google Drive)]](https://drive.google.com/drive/folders/1OIhnuQdIjuSSDc_QL2nP4NwugVDgtItD) [[ Dataset (Hugging Face) ]](https://huggingface.co/datasets/U4R/DocGenome/tree/main) [[Models πŸ€—(Hugging Face)]](https://huggingface.co/U4R/StructTable-InternVL-1B/tree/main)
17
 
18
 
19
  </div>
@@ -24,7 +24,7 @@ Welcome to the official repository of StructEqTable-Deploy, a solution that conv
24
  Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
25
 
26
  ## Changelog
27
- - [2024/10/19] πŸ”₯ We have released our **latest model [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL-1B/tree/main)**!
28
 
29
  Thanks to IntenrVL2 powerful foundational capabilities, and through fine-tuning on the synthetic tabular data and DocGenome dataset, StructTable can convert table image into various common table formats including LaTeX, HTML, and Markdown. Moreover, inference speed has been significantly improved compared to the v0.2 version.
30
  - [2024/8/22] We have released our StructTable-base-v0.2, fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.
@@ -37,7 +37,7 @@ Table is an effective way to represent structured data in scientific publication
37
  - [x] Support Chinese version of StructEqTable.
38
  - [x] Accelerated version of StructEqTable using TensorRT-LLM.
39
  - [x] Expand more domains of table image to improve the model's general capabilities.
40
- - [x] Efficient inference of StructTable-InternVL2-1B by [LMDepoly](https://github.com/InternLM/lmdeploy) Tookit.
41
  - [ ] Release our table pre-training and fine-tuning code
42
 
43
 
@@ -60,11 +60,11 @@ pip install struct-eqtable==0.3.0
60
 
61
  ## Model Zoo
62
 
63
- | Model | Model Size | Training Data | Data Augmentation | LMDepoly | TensorRT | HuggingFace |
64
  |---------------------|------------|------------------|-------------------|----------|----------|-------------------|
65
- | StructEqTable-InternVL | ~1B | DocGenome and Synthetic Data | βœ” | βœ” | | [v0.3](https://huggingface.co/U4R/StructTable-InternVL-1B/tree/main) |
66
- | StructEqTable-base | ~300M | DocGenome | βœ” | | βœ” | [v0.2](https://huggingface.co/U4R/StructTable-base/tree/v0.2) |
67
- | StructEqTable-base | ~300M | DocGenome | | | βœ” | [v0.1](https://huggingface.co/U4R/StructTable-base/tree/v0.1) |
68
 
69
 
70
 
@@ -75,7 +75,7 @@ cd tools/demo
75
 
76
  python demo.py \
77
  --image_path ./demo.png \
78
- --ckpt_path U4R/StructTable-InternVL-1B \
79
  --output_format latex
80
  ```
81
 
@@ -84,7 +84,7 @@ python demo.py \
84
  ```shell script
85
  python demo.py \
86
  --image_path ./demo.png \
87
- --ckpt_path U4R/StructTable-InternVL-1B \
88
  --output_format html markdown
89
  ```
90
 
@@ -100,18 +100,24 @@ cd tools/demo
100
 
101
  python demo.py \
102
  --image_path ./demo.png \
103
- --ckpt_path U4R/StructTable-InternVL-1B \
104
  --output_format latex \
105
  --lmdeploy
106
  ```
107
 
108
 
 
 
 
 
 
 
109
  ## Acknowledgements
110
  - [DocGenome](https://github.com/UniModal4Reasoning/DocGenome). An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models.
111
  - [ChartVLM](https://github.com/UniModal4Reasoning/ChartVLM). A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
112
  - [Pix2Struct](https://github.com/google-research/pix2struct). Screenshot Parsing as Pretraining for Visual Language Understanding.
113
  - [InternVL Family](https://github.com/OpenGVLab/InternVL). A Series of Powerful Foundational Vision-Language Models.
114
- - [LMDepoly](https://github.com/InternLM/lmdeploy). A toolkit for compressing, deploying, and serving LLM and MLLM.
115
  - [UniMERNet](https://github.com/opendatalab/UniMERNet). A Universal Network for Real-World Mathematical Expression Recognition.
116
  - [Donut](https://huggingface.co/naver-clova-ix/donut-base). The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
117
  - [Nougat](https://github.com/facebookresearch/nougat). Data Augmentation follows Nougat.
@@ -134,4 +140,3 @@ If you find our models / code / papers useful in your research, please consider
134
 
135
  ## Contact Us
136
  If you encounter any issues or have questions, please feel free to contact us via zhouhongbin@pjlab.org.cn.
137
-
 
13
 
14
  [[ Github Repo ]](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) [[ Related Paper ]](https://arxiv.org/abs/2406.11633) [[ Website ]](https://unimodal4reasoning.github.io/DocGenome_page/)
15
 
16
+ [[ DatasetπŸ€— ]](https://huggingface.co/datasets/U4R/DocGenome/tree/main) [[ ModelsπŸ€— ]](https://huggingface.co/U4R/StructTable-InternVL2-1B/tree/main)
17
 
18
 
19
  </div>
 
24
  Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
25
 
26
  ## Changelog
27
+ - [2024/10/19] πŸ”₯ We have released our **latest model [StructTable-InternVL2-1B](https://huggingface.co/U4R/StructTable-InternVL2-1B/tree/main)**!
28
 
29
  Thanks to IntenrVL2 powerful foundational capabilities, and through fine-tuning on the synthetic tabular data and DocGenome dataset, StructTable can convert table image into various common table formats including LaTeX, HTML, and Markdown. Moreover, inference speed has been significantly improved compared to the v0.2 version.
30
  - [2024/8/22] We have released our StructTable-base-v0.2, fine-tuned on the DocGenome dataset. This version features improved inference speed and robustness, achieved through data augmentation and reduced image token num.
 
37
  - [x] Support Chinese version of StructEqTable.
38
  - [x] Accelerated version of StructEqTable using TensorRT-LLM.
39
  - [x] Expand more domains of table image to improve the model's general capabilities.
40
+ - [x] Efficient inference of StructTable-InternVL2-1B by [LMDeploy](https://github.com/InternLM/lmdeploy) Tookit.
41
  - [ ] Release our table pre-training and fine-tuning code
42
 
43
 
 
60
 
61
  ## Model Zoo
62
 
63
+ | Base Model | Model Size | Training Data | Data Augmentation | LMDeploy | TensorRT | HuggingFace |
64
  |---------------------|------------|------------------|-------------------|----------|----------|-------------------|
65
+ | InternVL2-1B | ~1B | DocGenome and Synthetic Data | βœ” | βœ” | | [StructTable v0.3](https://huggingface.co/U4R/StructTable-InternVL2-1B/tree/main) |
66
+ | Pix2Struct-base | ~300M | DocGenome | βœ” | | βœ” | [StructTable v0.2](https://huggingface.co/U4R/StructTable-base/tree/v0.2) |
67
+ | Pix2Struct-base | ~300M | DocGenome | | | βœ” | [StructTable v0.1](https://huggingface.co/U4R/StructTable-base/tree/v0.1) |
68
 
69
 
70
 
 
75
 
76
  python demo.py \
77
  --image_path ./demo.png \
78
+ --ckpt_path U4R/StructTable-InternVL2-1B \
79
  --output_format latex
80
  ```
81
 
 
84
  ```shell script
85
  python demo.py \
86
  --image_path ./demo.png \
87
+ --ckpt_path U4R/StructTable-InternVL2-1B \
88
  --output_format html markdown
89
  ```
90
 
 
100
 
101
  python demo.py \
102
  --image_path ./demo.png \
103
+ --ckpt_path U4R/StructTable-InternVL2-1B \
104
  --output_format latex \
105
  --lmdeploy
106
  ```
107
 
108
 
109
+ - Visualization Result
110
+
111
+ You can copy the output LaTeX code into [demo.tex](../tools/demo/demo.tex), then use [Overleaf](https://www.overleaf.com/project) for table visualization.
112
+ ![](docs/imgs/output.png)
113
+
114
+
115
  ## Acknowledgements
116
  - [DocGenome](https://github.com/UniModal4Reasoning/DocGenome). An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models.
117
  - [ChartVLM](https://github.com/UniModal4Reasoning/ChartVLM). A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
118
  - [Pix2Struct](https://github.com/google-research/pix2struct). Screenshot Parsing as Pretraining for Visual Language Understanding.
119
  - [InternVL Family](https://github.com/OpenGVLab/InternVL). A Series of Powerful Foundational Vision-Language Models.
120
+ - [LMDeploy](https://github.com/InternLM/lmdeploy). A toolkit for compressing, deploying, and serving LLM and MLLM.
121
  - [UniMERNet](https://github.com/opendatalab/UniMERNet). A Universal Network for Real-World Mathematical Expression Recognition.
122
  - [Donut](https://huggingface.co/naver-clova-ix/donut-base). The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
123
  - [Nougat](https://github.com/facebookresearch/nougat). Data Augmentation follows Nougat.
 
140
 
141
  ## Contact Us
142
  If you encounter any issues or have questions, please feel free to contact us via zhouhongbin@pjlab.org.cn.