|
--- |
|
license: mit |
|
--- |
|
|
|
# LayoutReader |
|
|
|
**TODO:** |
|
1. upload models to huggingface |
|
2. explain why this repo |
|
3. explain the new dataset |
|
4. build docker image |
|
|
|
## Helper |
|
|
|
### Build Dataset |
|
|
|
```bash |
|
python tools.py cache-dataset-spans --help |
|
``` |
|
|
|
### Train |
|
|
|
```bash |
|
bash train.sh |
|
``` |
|
|
|
### Eval |
|
|
|
```bash |
|
python eval.py --help |
|
``` |
|
|
|
## Spans-Level Results |
|
|
|
One bbox contains multiple tokens. Usually, parse pdf file to get bbox. Training data is generated by `tools.py`. |
|
|
|
> only use the first part of test file |
|
|
|
| Method | shuf | BLEU Idx | BLEU Token | |
|
|----------------------------|------|----------|------------| |
|
| Heuristic Method | no | 44.4 | 70.7 | |
|
| LayoutReader (layout only) | no | 95.3 | 97.8 | |
|
| LayoutReader (layout only) | yes | 95.0 | 97.6 | |
|
|
|
## Tokens-Level Results |
|
|
|
One bbox contains only one token. |
|
|
|
### New eval script |
|
|
|
> only use the first part of test file |
|
|
|
| Method | shuf | BLEU Idx | BLEU Token | |
|
|-----------------------------|------|----------|------------| |
|
| Heuristic Method | no | 78.3 | 79.4 | |
|
| LayoutReader (layout only) | no | 98.0 | 98.2 | |
|
| LayoutReader (layout only) | yes | 97.8 | 98.0 | |
|
| LayoutReader (public model) | no | 98.0 | 98.3 | |
|
|
|
### Old eval script (from original paper) |
|
|
|
* Evaluation results of the LayoutReader on the reading order detection task, where the source-side of training/testing |
|
data is in the left-to-right and top-to-bottom order |
|
|
|
| Method | Encoder | BLEU | ARD | |
|
|----------------------------|------------------------|--------|------| |
|
| Heuristic Method | - | 0.6972 | 8.46 | |
|
| LayoutReader (layout only) | LayoutLM (layout only) | 0.9732 | 2.31 | |
|
| LayoutReader | LayoutLM | 0.9819 | 1.75 | |
|
|
|
* Input order study with left-to-right and top-to-bottom inputs in evaluation, where r is the proportion of |
|
shuffled samples in training. |
|
|
|
| Method | BLEU | BLEU | BLEU | ARD | ARD | ARD | |
|
|----------------------------|--------|--------|--------|--------|-------|------| |
|
| | r=100% | r=50% | r=0% | r=100% | r=50% | r=0% | |
|
| LayoutReader (layout only) | 0.9701 | 0.9729 | 0.9732 | 2.85 | 2.61 | 2.31 | |
|
| LayoutReader | 0.9765 | 0.9788 | 0.9819 | 2.50 | 2.24 | 1.75 | |
|
|
|
* Input order study with token-shuffled inputs in evaluation, where r is the proportion of shuffled samples in training. |
|
|
|
| Method | BLEU | BLEU | BLEU | ARD | ARD | ARD | |
|
|----------------------------|--------|--------|--------|--------|-------|--------| |
|
| | r=100% | r=50% | r=0% | r=100% | r=50% | r=0% | |
|
| LayoutReader (layout only) | 0.9718 | 0.9714 | 0.1331 | 2.72 | 2.82 | 105.40 | |
|
| LayoutReader | 0.9772 | 0.9770 | 0.1783 | 2.48 | 2.46 | 72.94 | |