hantian
/

layoutreader

Token Classification

Inference Endpoints

Model card Files Files and versions Community

layoutreader / README.md

hantian's picture

Update README.md

ba3b3cb verified 10 months ago

|

2.99 kB

	---
	license: mit
	---

	# LayoutReader

	TODO:
	1. upload models to huggingface
	2. explain why this repo
	3. explain the new dataset
	4. build docker image

	## Helper

	### Build Dataset

	```bash
	python tools.py cache-dataset-spans --help
	```

	### Train

	```bash
	bash train.sh
	```

	### Eval

	```bash
	python eval.py --help
	```

	## Spans-Level Results

	One bbox contains multiple tokens. Usually, parse pdf file to get bbox. Training data is generated by `tools.py`.

	> only use the first part of test file

	\| Method \| shuf \| BLEU Idx \| BLEU Token \|
	\|----------------------------\|------\|----------\|------------\|
	\| Heuristic Method \| no \| 44.4 \| 70.7 \|
	\| LayoutReader (layout only) \| no \| 95.3 \| 97.8 \|
	\| LayoutReader (layout only) \| yes \| 95.0 \| 97.6 \|

	## Tokens-Level Results

	One bbox contains only one token.

	### New eval script

	> only use the first part of test file

	\| Method \| shuf \| BLEU Idx \| BLEU Token \|
	\|-----------------------------\|------\|----------\|------------\|
	\| Heuristic Method \| no \| 78.3 \| 79.4 \|
	\| LayoutReader (layout only) \| no \| 98.0 \| 98.2 \|
	\| LayoutReader (layout only) \| yes \| 97.8 \| 98.0 \|
	\| LayoutReader (public model) \| no \| 98.0 \| 98.3 \|

	### Old eval script (from original paper)

	* Evaluation results of the LayoutReader on the reading order detection task, where the source-side of training/testing
	data is in the left-to-right and top-to-bottom order

	\| Method \| Encoder \| BLEU \| ARD \|
	\|----------------------------\|------------------------\|--------\|------\|
	\| Heuristic Method \| - \| 0.6972 \| 8.46 \|
	\| LayoutReader (layout only) \| LayoutLM (layout only) \| 0.9732 \| 2.31 \|
	\| LayoutReader \| LayoutLM \| 0.9819 \| 1.75 \|

	* Input order study with left-to-right and top-to-bottom inputs in evaluation, where r is the proportion of
	shuffled samples in training.

	\| Method \| BLEU \| BLEU \| BLEU \| ARD \| ARD \| ARD \|
	\|----------------------------\|--------\|--------\|--------\|--------\|-------\|------\|
	\| \| r=100% \| r=50% \| r=0% \| r=100% \| r=50% \| r=0% \|
	\| LayoutReader (layout only) \| 0.9701 \| 0.9729 \| 0.9732 \| 2.85 \| 2.61 \| 2.31 \|
	\| LayoutReader \| 0.9765 \| 0.9788 \| 0.9819 \| 2.50 \| 2.24 \| 1.75 \|

	* Input order study with token-shuffled inputs in evaluation, where r is the proportion of shuffled samples in training.

	\| Method \| BLEU \| BLEU \| BLEU \| ARD \| ARD \| ARD \|
	\|----------------------------\|--------\|--------\|--------\|--------\|-------\|--------\|
	\| \| r=100% \| r=50% \| r=0% \| r=100% \| r=50% \| r=0% \|
	\| LayoutReader (layout only) \| 0.9718 \| 0.9714 \| 0.1331 \| 2.72 \| 2.82 \| 105.40 \|
	\| LayoutReader \| 0.9772 \| 0.9770 \| 0.1783 \| 2.48 \| 2.46 \| 72.94 \|