ali6parmak
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -172,6 +172,28 @@ we process them after sorting all segments with content. To determine their read
|
|
172 |
using distance as a criterion.
|
173 |
|
174 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
175 |
## Benchmark
|
176 |
|
177 |
These are the benchmark results for VGT model on PubLayNet dataset:
|
|
|
172 |
using distance as a criterion.
|
173 |
|
174 |
|
175 |
+
### Extracting Tables and Formulas
|
176 |
+
|
177 |
+
Our service provides a way to extract your tables and formulas in different formats.
|
178 |
+
|
179 |
+
As default, formula segments' "text" property will include the formula in LaTeX format.
|
180 |
+
|
181 |
+
You can also extract tables in different formats like "markdown", "latex", or "html" but this is not a default option.
|
182 |
+
To extract the tables like this, you should set "extraction_format" parameter. Some example usages shown below:
|
183 |
+
|
184 |
+
```
|
185 |
+
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060 -F "extraction_format=latex"
|
186 |
+
```
|
187 |
+
```
|
188 |
+
curl -X POST -F 'file=@/PATH/TO/PDF/pdf_name.pdf' localhost:5060/fast -F "extraction_format=markdown"
|
189 |
+
```
|
190 |
+
|
191 |
+
You should be aware that this additional extraction process can make the process much longer, especially if you have a large number of tables.
|
192 |
+
|
193 |
+
(For table extraction, we are using [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy)
|
194 |
+
and for formula extraction, we are using [RapidLaTeXOCR](https://github.com/RapidAI/RapidLaTeXOCR))
|
195 |
+
|
196 |
+
|
197 |
## Benchmark
|
198 |
|
199 |
These are the benchmark results for VGT model on PubLayNet dataset:
|