hantian
/

yolo-doclaynet

document-analysis

Model card Files Files and versions Community

yolo-doclaynet / README.md

hantian's picture

Update README.md

dfabea3 verified 9 months ago

|

2.47 kB

	---
	datasets:
	- ds4sd/DocLayNet
	language:
	- en
	tags:
	- YOLO
	- document-analysis
	---

	More details refer to [Github](https://github.com/ppaanngggg/yolo-doclaynet)

	## Introduction

	You know that RAG is very popular these days. There are many applications that support talking to documents. However,
	there is a huge performance drop when talking to a complex document due to the complex structures. So it's a challenge
	to extract content from complex document and organize it into parsable form. This repo aims to solve this challenge with
	a fast and good performance method.

	## Detection Sample

	![image](https://github.com/ppaanngggg/yolo-doclaynet/raw/main/annotated-test.png)

	## Method

	1. `YOLO` is the most advenced detect model developed by [Ultralytics](https://github.com/ultralytics/ultralytics). YOLO
	has 5 different sizes of base model and a super powerful framework for training and deployment. So I chose YOLO to
	solve this challenge.
	2. `DocLayNet` is a human-annotated document layout segmentation dataset containing 80863 pages from a broad variety of
	document sources. As far as I know, it's the most qualified document layout analysis dataset.

	## Usage

	```python
	from ultralytics import YOLO

	model = YOLO("{path to model file}")
	pred = model("{path to test image}")
	print(pred)
	```

	## Dataset

	DocLayNet can be found more details and download at this [link](https://github.com/DS4SD/DocLayNet). It has 11 labels:

	- Text: Regular paragraphs.
	- Picture: A graphic or photograph.
	- Caption: Special text outside a picture or table that introduces this picture or
	table.
	- Section-header: Any kind of heading in the text, except overall document title.
	- Footnote: Typically small text at the bottom of a page, with a number or symbol
	that is referred to in the text above.
	- Formula: Mathematical equation on its own line.
	- Table: Material arranged in a grid alignment with rows and columns, often
	with separator lines.
	- List-item: One element of a list, in a hanging shape, i.e., from the second line
	onwards the paragraph is indented more than the first line.
	- Page-header: Repeating elements like page number at the top, outside of the
	normal text flow.
	- Page-footer: Repeating elements like page number at the bottom, outside of the
	normal text flow.
	- Title: Overall title of a document, (almost) exclusively on the first page and
	typically appearing in large font.